Overview
We performed OCR processing on Japanese vertical writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence) and conducted a detailed comparative evaluation of the results.
Test Image
- Image Source: Canva template (400-character manuscript paper)
- URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
- Image Characteristics:
- 20x20 grid, 400-character manuscript paper
- Vertical writing layout
- Light grid lines (squares)
- Distinction between title section and body section

Ground Truth
1. Recognition Results with Azure OpenAI GPT-4.1
Recognized Text
Evaluation
GPT-4.1 demonstrated the following characteristics for vertical writing manuscript paper:
- ✅ Correctly recognized the order of title and author name
- ✅ Accurately recognized the beginning of the body text
- ✅ Recognized descriptions related to manuscript paper grid squares
- ✅ Perfectly understood the vertical reading order (right to left)
- ✅ Maintained text continuity
Differences from Ground Truth
- “佐藤ちあき” → “佐藤 ちあき” (full-width space added)
- This is a reasonable interpretation since the image appears to have a space
- All other text was an exact match
Accuracy Rating: 99%
2. Recognition Results with Azure Document Intelligence
Visualization of Recognized Regions

Evaluation
Document Intelligence demonstrated the following characteristics:
- ✅ Character recognition capability - Individual characters were accurately recognized (“佐藤”, “ちあき”, “原稿”, etc.)
- ⚠️ Text fragmentation - Each grid square was processed as an independent element, losing continuity
- ❌ Vertical reading order issues - Unable to properly handle the right-to-left flow of vertical writing
- ⚠️ Post-processing required - Some reconstruction is possible using coordinate information
- ✅ Detailed coordinate information - Position information for each character was perfectly obtained
Accuracy Rating: Character recognition accuracy approximately 80%, but with challenges in understanding vertical layout
Comparative Analysis
Performance Comparison Table
| Evaluation Item | Azure OpenAI GPT-4.1 | Document Intelligence |
|---|---|---|
| Character Recognition Accuracy | ⭐⭐⭐⭐⭐ (99%) | ⭐⭐⭐⭐ (80%) |
| Vertical Writing Support | ⭐⭐⭐⭐⭐ Perfect | ⭐⭐ Post-processing required |
| Context Understanding | ⭐⭐⭐⭐⭐ Excellent | ⭐⭐ Limited |
| Reading Order Comprehension | ⭐⭐⭐⭐⭐ Perfect | ⭐⭐ Reconstruction required |
| Manuscript Paper Support | ⭐⭐⭐⭐⭐ Optimal | ⭐⭐⭐ Possible with adjustments |
| Coordinate Information | ❌ None | ⭐⭐⭐⭐⭐ Detailed retrieval possible |
| Processing Speed | ⭐⭐⭐ ~7 sec/image | ⭐⭐⭐⭐⭐ ~3 sec/image |
| Cost | ⭐⭐ Expensive | ⭐⭐⭐⭐ Affordable |
Visual Comparison
GPT-4.1 Recognition Pattern
- Understands the entire image and interprets it as a document
- Correctly grasps the vertical writing structure
- Extracts only text while ignoring grid squares
Document Intelligence Recognition Pattern
- Processes each grid square as an individual text block
- Recognizes vertical columns as “lines” (designed for horizontal writing)
- Reconstruction is possible by leveraging coordinate information
Conclusion
Key Findings
Overwhelming Superiority of GPT-4.1
- For Japanese vertical writing documents, GPT-4.1 achieves near-perfect recognition
- Correctly understands the document structure, reading order, and context
Document Intelligence Characteristics
- No direct support for vertical Japanese; post-processing is required
- High character detection accuracy, but challenges in layout understanding
- Advanced processing is possible by leveraging coordinate information
- Delivers high performance for horizontal writing documents
Practical Recommendations
When to Choose Azure OpenAI GPT-4
- 📚 Digitization of Japanese vertical writing documents
- 📖 OCR of historical documents and classical texts
- ✍️ Manuscript paper processing
- 🎯 When high-accuracy text extraction is required
When to Choose Document Intelligence
- 📍 When character position identification is important
- 🔍 Processing horizontal writing documents
- 💰 When cost is a priority for bulk processing
- ⚡ When processing speed is the top priority
- 🛠️ When advanced customization through post-processing is possible
Technical Considerations
This experiment clearly demonstrated the differences in approach between LLM-based vision models (GPT-4) and traditional OCR engines (Document Intelligence):
- GPT-4: “Understands” images and performs intelligent processing with context awareness. Flexibly handles diverse layouts including vertical writing
- Document Intelligence: Specializes in high-precision character detection and coordinate extraction. Advanced processing is possible when combined with programmable post-processing
Both services have different strengths, and it is important to choose based on the use case. For special layouts like Japanese vertical writing, GPT-4 currently has the advantage, but Document Intelligence can also handle it through post-processing with coordinate information.
Future Outlook
- Expectations for improved vertical Japanese support in Document Intelligence
- Improved processing speed and cost reduction for GPT-4
- Possibility of a hybrid approach (GPT-4 for character recognition, DI for coordinate retrieval)