Azure OpenAI GPT-4 vs Document Intelligence: Comparative Evaluation of Japanese Vertical Text OCR

Overview

We performed OCR processing on Japanese vertical writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence) and conducted a detailed comparative evaluation of the results.

Test Image

Image Source: Canva template (400-character manuscript paper)
URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
Image Characteristics:
- 20x20 grid, 400-character manuscript paper
- Vertical writing layout
- Light grid lines (squares)
- Distinction between title section and body section

Ground Truth

1. Recognition Results with Azure OpenAI GPT-4.1

Recognized Text

Evaluation

GPT-4.1 demonstrated the following characteristics for vertical writing manuscript paper:

✅ Correctly recognized the order of title and author name
✅ Accurately recognized the beginning of the body text
✅ Recognized descriptions related to manuscript paper grid squares
✅ Perfectly understood the vertical reading order (right to left)
✅ Maintained text continuity

Differences from Ground Truth

“佐藤ちあき” → “佐藤ちあき” (full-width space added)
- This is a reasonable interpretation since the image appears to have a space
All other text was an exact match

Accuracy Rating: 99%

2. Recognition Results with Azure Document Intelligence

Visualization of Recognized Regions

Evaluation

Document Intelligence demonstrated the following characteristics:

✅ Character recognition capability - Individual characters were accurately recognized (“佐藤”, “ちあき”, “原稿”, etc.)
⚠️ Text fragmentation - Each grid square was processed as an independent element, losing continuity
❌ Vertical reading order issues - Unable to properly handle the right-to-left flow of vertical writing
⚠️ Post-processing required - Some reconstruction is possible using coordinate information
✅ Detailed coordinate information - Position information for each character was perfectly obtained

Accuracy Rating: Character recognition accuracy approximately 80%, but with challenges in understanding vertical layout

Comparative Analysis

Performance Comparison Table

Evaluation Item	Azure OpenAI GPT-4.1	Document Intelligence
Character Recognition Accuracy	⭐⭐⭐⭐⭐ (99%)	⭐⭐⭐⭐ (80%)
Vertical Writing Support	⭐⭐⭐⭐⭐ Perfect	⭐⭐ Post-processing required
Context Understanding	⭐⭐⭐⭐⭐ Excellent	⭐⭐ Limited
Reading Order Comprehension	⭐⭐⭐⭐⭐ Perfect	⭐⭐ Reconstruction required
Manuscript Paper Support	⭐⭐⭐⭐⭐ Optimal	⭐⭐⭐ Possible with adjustments
Coordinate Information	❌ None	⭐⭐⭐⭐⭐ Detailed retrieval possible
Processing Speed	⭐⭐⭐ ~7 sec/image	⭐⭐⭐⭐⭐ ~3 sec/image
Cost	⭐⭐ Expensive	⭐⭐⭐⭐ Affordable

Visual Comparison

GPT-4.1 Recognition Pattern

Understands the entire image and interprets it as a document
Correctly grasps the vertical writing structure
Extracts only text while ignoring grid squares

Document Intelligence Recognition Pattern

Processes each grid square as an individual text block
Recognizes vertical columns as “lines” (designed for horizontal writing)
Reconstruction is possible by leveraging coordinate information

Conclusion

Key Findings

Overwhelming Superiority of GPT-4.1
- For Japanese vertical writing documents, GPT-4.1 achieves near-perfect recognition
- Correctly understands the document structure, reading order, and context
Document Intelligence Characteristics
- No direct support for vertical Japanese; post-processing is required
- High character detection accuracy, but challenges in layout understanding
- Advanced processing is possible by leveraging coordinate information
- Delivers high performance for horizontal writing documents

Practical Recommendations

When to Choose Azure OpenAI GPT-4

📚 Digitization of Japanese vertical writing documents
📖 OCR of historical documents and classical texts
✍️ Manuscript paper processing
🎯 When high-accuracy text extraction is required

When to Choose Document Intelligence

📍 When character position identification is important
🔍 Processing horizontal writing documents
💰 When cost is a priority for bulk processing
⚡ When processing speed is the top priority
🛠️ When advanced customization through post-processing is possible

Technical Considerations

This experiment clearly demonstrated the differences in approach between LLM-based vision models (GPT-4) and traditional OCR engines (Document Intelligence):

GPT-4: “Understands” images and performs intelligent processing with context awareness. Flexibly handles diverse layouts including vertical writing
Document Intelligence: Specializes in high-precision character detection and coordinate extraction. Advanced processing is possible when combined with programmable post-processing

Both services have different strengths, and it is important to choose based on the use case. For special layouts like Japanese vertical writing, GPT-4 currently has the advantage, but Document Intelligence can also handle it through post-processing with coordinate information.

Future Outlook

Expectations for improved vertical Japanese support in Document Intelligence
Improved processing speed and cost reduction for GPT-4
Possibility of a hybrid approach (GPT-4 for character recognition, DI for coordinate retrieval)

Overview#

Test Image#

Ground Truth#

1. Recognition Results with Azure OpenAI GPT-4.1#

Recognized Text#

Evaluation#

Differences from Ground Truth#

2. Recognition Results with Azure Document Intelligence#

Visualization of Recognized Regions#

Evaluation#

Comparative Analysis#

Performance Comparison Table#

Visual Comparison#

GPT-4.1 Recognition Pattern#

Document Intelligence Recognition Pattern#

Conclusion#

Key Findings#

Practical Recommendations#

When to Choose Azure OpenAI GPT-4#

When to Choose Document Intelligence#

Technical Considerations#

Future Outlook#

Overview

Test Image

Ground Truth

1. Recognition Results with Azure OpenAI GPT-4.1

Recognized Text

Evaluation

Differences from Ground Truth

2. Recognition Results with Azure Document Intelligence

Visualization of Recognized Regions

Evaluation

Comparative Analysis

Performance Comparison Table

Visual Comparison

GPT-4.1 Recognition Pattern

Document Intelligence Recognition Pattern

Conclusion

Key Findings

Practical Recommendations

When to Choose Azure OpenAI GPT-4

When to Choose Document Intelligence

Technical Considerations

Future Outlook