Introduction
In this article, we compared and verified the OCR performance of major LLM models using actual manuscript paper images. While many OCR benchmarks target printed documents and horizontally written text, we evaluate recognition accuracy on the special format of Japanese vertical manuscript paper to more practically verify each model’s Japanese document understanding capabilities.
Features of This Verification
- Using the uniquely Japanese manuscript paper format: Verification with images containing complex elements such as characters placed in grid cells, vertical writing layout, and distinctive margin composition
- Assuming practical use cases: Performance evaluation on manuscript paper used in actual writing scenarios such as essays, novels, and academic papers
- Comprehensive comparison of the latest models: Comparison of the latest models – GPT-5, GPT-4.1, Gemini 2.5 Pro, Claude Opus 4.1, and Claude Sonnet 4 – under identical conditions
Verification Overview
Image Used
- Image source: Canva template (400-character manuscript paper)
- URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
- Image characteristics:
- 20x20 grid, 400-character manuscript paper
- Vertical writing layout
- Faint grid lines (cells)
- Distinction between title area and body area

Verification Conditions
- Prompt used: “OCR this” (common across all models)
- Parameters: Default settings for each model
- Execution period: September 2025
Ground Truth Text
Evaluation Method
The accuracy scores in this article are subjective scores that comprehensively evaluate character recognition accuracy, layout understanding, and text structure preservation. From a practical perspective, we have quantified the strengths and challenges of each model in an easy-to-understand manner.
Detailed OCR Results
1st Place: Gemini 2.5 Pro - Accuracy Score: 98/100
Evaluation Points:
- Character recognition: Nearly perfect
- Missing space in author name
- Paragraph composition: Appropriate 2-paragraph structure
- Layout preservation: Excellent
2nd Place: GPT-5 - Accuracy Score: 97/100
Evaluation Points:
- Character recognition: Perfect
- Space present in author name
- No paragraph separation (continuous as 1 paragraph)
- Body text: Completely accurate
3rd Place: GPT-4.1 - Accuracy Score: 92/100
Evaluation Points:
- Body text recognition: Perfect
- Author name position: Incorrectly placed at the end
- Space present in author name
- Paragraph composition: Appropriate
Claude Opus 4.1 - Accuracy Score: 70/100
Evaluation Points:
- Text flow collapsed (cut off at “for essays etc.”)
- Unnatural sentence beginning with “and short essays”
- Missing “characters that fit the grid cells”
Claude Sonnet 4 - Accuracy Score: 65/100
Evaluation Points:
- Misrecognition of “somewhat”
- Word order confusion (“essay please use”)
- Missing opening portion
- Second half: Accurate
Analysis and Discussion
Performance Ranking Summary
| Rank | Model | Accuracy Score | Strengths | Challenges |
|---|---|---|---|---|
| 1 | Gemini 2.5 Pro | 98/100 | Appropriate paragraph composition, character recognition | Missing space in author name |
| 2 | GPT-5 | 97/100 | Perfect character recognition | No paragraph separation |
| 3 | GPT-4.1 | 92/100 | Paragraph composition, body text accuracy | Author name position |
| 4 | Claude Opus 4.1 | 70/100 | Basic recognition | Text structure collapse |
| 5 | Claude Sonnet 4 | 65/100 | Second half recognition | Missing opening, word order confusion |
Challenges Specific to Manuscript Paper
Impact of grid cells
- Misrecognizing grid lines as part of characters
- Forced interpretation of line breaks due to grid cells
Complexity of vertical writing layout
- Understanding column movement from right to left
- Interpretation of whitespace between paragraphs
Placement of meta-information
- Positional relationship between title and author name
- Distinction from body text
Characteristic Behaviors by Model
Google (Gemini)
- Most accurately understood the manuscript paper format
- Deep understanding of Japanese document structure
OpenAI (GPT-5, GPT-4.1)
- Stable character recognition capabilities
- Prioritized content accuracy over layout
Anthropic (Claude)
- Struggled with vertical writing layout interpretation
- Particularly in Opus 4.1, exhibited peculiar behavior of text duplication
Practical Recommendations
Recommended Models by Scenario
Novels and creative manuscripts
- Recommended: Gemini 2.5 Pro
- Reason: Layout preservation and high accuracy
Academic papers and reports
- Recommended: GPT-5
- Reason: Focus on content accuracy
Batch processing / cost-focused
- Recommended: GPT-4.1
- Reason: Sufficient accuracy and balance
Best Practices for Manuscript Paper OCR
- Preprocessing
- Prompt optimization
Post-processing workflow
- Cross-checking with multiple models
- Automatic removal of duplicate portions
- Verification of author name position
Summary
Through OCR verification on the uniquely Japanese manuscript paper format, clear performance differences between LLM models became apparent. Gemini 2.5 Pro and GPT-5 demonstrated high accuracy, showing strengths in paragraph composition and character recognition, respectively. Meanwhile, Claude-series models revealed challenges in understanding vertical writing layouts.
Notes
The results in this verification are based on the simple prompt “OCR this.” Performance of each model can be significantly improved through prompt engineering and parameter adjustments. For example, explicitly specifying that it is “vertical Japanese manuscript paper” or adjusting the temperature parameter may yield more accurate results.
The results in this article should be used only as a reference, and we recommend exploring optimal settings for your specific use case.
Verification date: September 2025 Image used: Canva template EAFbqUoH7P8 Prompt used: “OCR this”