Overview

We performed OCR processing on Japanese vertical writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence) and conducted a detailed comparative evaluation of the results.

Test Image

  • Image Source: Canva template (400-character manuscript paper)
  • URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
  • Image Characteristics:
    • 20x20 grid, 400-character manuscript paper
    • Vertical writing layout
    • Light grid lines (squares)
    • Distinction between title section and body section

Ground Truth

稿稿使使使使

1. Recognition Results with Azure OpenAI GPT-4.1

Recognized Text

稿稿 使使使使

Evaluation

GPT-4.1 demonstrated the following characteristics for vertical writing manuscript paper:

  • Correctly recognized the order of title and author name
  • Accurately recognized the beginning of the body text
  • Recognized descriptions related to manuscript paper grid squares
  • Perfectly understood the vertical reading order (right to left)
  • Maintained text continuity

Differences from Ground Truth

  • “佐藤ちあき” → “佐藤 ちあき” (full-width space added)
    • This is a reasonable interpretation since the image appears to have a space
  • All other text was an exact match

Accuracy Rating: 99%

2. Recognition Results with Azure Document Intelligence

Visualization of Recognized Regions

Evaluation

Document Intelligence demonstrated the following characteristics:

  • Character recognition capability - Individual characters were accurately recognized (“佐藤”, “ちあき”, “原稿”, etc.)
  • ⚠️ Text fragmentation - Each grid square was processed as an independent element, losing continuity
  • Vertical reading order issues - Unable to properly handle the right-to-left flow of vertical writing
  • ⚠️ Post-processing required - Some reconstruction is possible using coordinate information
  • Detailed coordinate information - Position information for each character was perfectly obtained

Accuracy Rating: Character recognition accuracy approximately 80%, but with challenges in understanding vertical layout

Comparative Analysis

Performance Comparison Table

Evaluation ItemAzure OpenAI GPT-4.1Document Intelligence
Character Recognition Accuracy⭐⭐⭐⭐⭐ (99%)⭐⭐⭐⭐ (80%)
Vertical Writing Support⭐⭐⭐⭐⭐ Perfect⭐⭐ Post-processing required
Context Understanding⭐⭐⭐⭐⭐ Excellent⭐⭐ Limited
Reading Order Comprehension⭐⭐⭐⭐⭐ Perfect⭐⭐ Reconstruction required
Manuscript Paper Support⭐⭐⭐⭐⭐ Optimal⭐⭐⭐ Possible with adjustments
Coordinate Information❌ None⭐⭐⭐⭐⭐ Detailed retrieval possible
Processing Speed⭐⭐⭐ ~7 sec/image⭐⭐⭐⭐⭐ ~3 sec/image
Cost⭐⭐ Expensive⭐⭐⭐⭐ Affordable

Visual Comparison

GPT-4.1 Recognition Pattern

  • Understands the entire image and interprets it as a document
  • Correctly grasps the vertical writing structure
  • Extracts only text while ignoring grid squares

Document Intelligence Recognition Pattern

  • Processes each grid square as an individual text block
  • Recognizes vertical columns as “lines” (designed for horizontal writing)
  • Reconstruction is possible by leveraging coordinate information

Conclusion

Key Findings

  1. Overwhelming Superiority of GPT-4.1

    • For Japanese vertical writing documents, GPT-4.1 achieves near-perfect recognition
    • Correctly understands the document structure, reading order, and context
  2. Document Intelligence Characteristics

    • No direct support for vertical Japanese; post-processing is required
    • High character detection accuracy, but challenges in layout understanding
    • Advanced processing is possible by leveraging coordinate information
    • Delivers high performance for horizontal writing documents

Practical Recommendations

When to Choose Azure OpenAI GPT-4

  • 📚 Digitization of Japanese vertical writing documents
  • 📖 OCR of historical documents and classical texts
  • ✍️ Manuscript paper processing
  • 🎯 When high-accuracy text extraction is required

When to Choose Document Intelligence

  • 📍 When character position identification is important
  • 🔍 Processing horizontal writing documents
  • 💰 When cost is a priority for bulk processing
  • When processing speed is the top priority
  • 🛠️ When advanced customization through post-processing is possible

Technical Considerations

This experiment clearly demonstrated the differences in approach between LLM-based vision models (GPT-4) and traditional OCR engines (Document Intelligence):

  • GPT-4: “Understands” images and performs intelligent processing with context awareness. Flexibly handles diverse layouts including vertical writing
  • Document Intelligence: Specializes in high-precision character detection and coordinate extraction. Advanced processing is possible when combined with programmable post-processing

Both services have different strengths, and it is important to choose based on the use case. For special layouts like Japanese vertical writing, GPT-4 currently has the advantage, but Document Intelligence can also handle it through post-processing with coordinate information.

Future Outlook

  • Expectations for improved vertical Japanese support in Document Intelligence
  • Improved processing speed and cost reduction for GPT-4
  • Possibility of a hybrid approach (GPT-4 for character recognition, DI for coordinate retrieval)