LLM-Based Manuscript Paper OCR Performance Comparison: Verification of Vertical Japanese Recognition Accuracy

Introduction

In this article, we compared and verified the OCR performance of major LLM models using actual manuscript paper images. While many OCR benchmarks target printed documents and horizontally written text, we evaluate recognition accuracy on the special format of Japanese vertical manuscript paper to more practically verify each model’s Japanese document understanding capabilities.

Features of This Verification

Using the uniquely Japanese manuscript paper format: Verification with images containing complex elements such as characters placed in grid cells, vertical writing layout, and distinctive margin composition
Assuming practical use cases: Performance evaluation on manuscript paper used in actual writing scenarios such as essays, novels, and academic papers
Comprehensive comparison of the latest models: Comparison of the latest models – GPT-5, GPT-4.1, Gemini 2.5 Pro, Claude Opus 4.1, and Claude Sonnet 4 – under identical conditions

Verification Overview

Image Used

Image source: Canva template (400-character manuscript paper)
URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
Image characteristics:
- 20x20 grid, 400-character manuscript paper
- Vertical writing layout
- Faint grid lines (cells)
- Distinction between title area and body area

Verification Conditions

Prompt used: “OCR this” (common across all models)
Parameters: Default settings for each model
Execution period: September 2025

Ground Truth Text

Evaluation Method

The accuracy scores in this article are subjective scores that comprehensively evaluate character recognition accuracy, layout understanding, and text structure preservation. From a practical perspective, we have quantified the strengths and challenges of each model in an easy-to-understand manner.

Detailed OCR Results

1st Place: Gemini 2.5 Pro - Accuracy Score: 98/100

Evaluation Points:

Character recognition: Nearly perfect
Missing space in author name
Paragraph composition: Appropriate 2-paragraph structure
Layout preservation: Excellent

2nd Place: GPT-5 - Accuracy Score: 97/100

Evaluation Points:

Character recognition: Perfect
Space present in author name
No paragraph separation (continuous as 1 paragraph)
Body text: Completely accurate

3rd Place: GPT-4.1 - Accuracy Score: 92/100

Evaluation Points:

Body text recognition: Perfect
Author name position: Incorrectly placed at the end
Space present in author name
Paragraph composition: Appropriate

Claude Opus 4.1 - Accuracy Score: 70/100

Evaluation Points:

Text flow collapsed (cut off at “for essays etc.”)
Unnatural sentence beginning with “and short essays”
Missing “characters that fit the grid cells”

Claude Sonnet 4 - Accuracy Score: 65/100

Evaluation Points:

Misrecognition of “somewhat”
Word order confusion (“essay please use”)
Missing opening portion
Second half: Accurate

Analysis and Discussion

Performance Ranking Summary

Rank	Model	Accuracy Score	Strengths	Challenges
1	Gemini 2.5 Pro	98/100	Appropriate paragraph composition, character recognition	Missing space in author name
2	GPT-5	97/100	Perfect character recognition	No paragraph separation
3	GPT-4.1	92/100	Paragraph composition, body text accuracy	Author name position
4	Claude Opus 4.1	70/100	Basic recognition	Text structure collapse
5	Claude Sonnet 4	65/100	Second half recognition	Missing opening, word order confusion

Challenges Specific to Manuscript Paper

Impact of grid cells
- Misrecognizing grid lines as part of characters
- Forced interpretation of line breaks due to grid cells
Complexity of vertical writing layout
- Understanding column movement from right to left
- Interpretation of whitespace between paragraphs
Placement of meta-information
- Positional relationship between title and author name
- Distinction from body text

Characteristic Behaviors by Model

Google (Gemini)

Most accurately understood the manuscript paper format
Deep understanding of Japanese document structure

OpenAI (GPT-5, GPT-4.1)

Stable character recognition capabilities
Prioritized content accuracy over layout

Anthropic (Claude)

Struggled with vertical writing layout interpretation
Particularly in Opus 4.1, exhibited peculiar behavior of text duplication

Practical Recommendations

Recommended Models by Scenario

Novels and creative manuscripts
- Recommended: Gemini 2.5 Pro
- Reason: Layout preservation and high accuracy
Academic papers and reports
- Recommended: GPT-5
- Reason: Focus on content accuracy
Batch processing / cost-focused
- Recommended: GPT-4.1
- Reason: Sufficient accuracy and balance

Best Practices for Manuscript Paper OCR

Preprocessing

Prompt optimization

Post-processing workflow
- Cross-checking with multiple models
- Automatic removal of duplicate portions
- Verification of author name position

Summary

Through OCR verification on the uniquely Japanese manuscript paper format, clear performance differences between LLM models became apparent. Gemini 2.5 Pro and GPT-5 demonstrated high accuracy, showing strengths in paragraph composition and character recognition, respectively. Meanwhile, Claude-series models revealed challenges in understanding vertical writing layouts.

Notes

The results in this verification are based on the simple prompt “OCR this.” Performance of each model can be significantly improved through prompt engineering and parameter adjustments. For example, explicitly specifying that it is “vertical Japanese manuscript paper” or adjusting the temperature parameter may yield more accurate results.

The results in this article should be used only as a reference, and we recommend exploring optimal settings for your specific use case.

Verification date: September 2025 Image used: Canva template EAFbqUoH7P8 Prompt used: “OCR this”

Introduction#

Features of This Verification#

Verification Overview#

Image Used#

Verification Conditions#

Ground Truth Text#

Evaluation Method#

Detailed OCR Results#

1st Place: Gemini 2.5 Pro - Accuracy Score: 98/100#

2nd Place: GPT-5 - Accuracy Score: 97/100#

3rd Place: GPT-4.1 - Accuracy Score: 92/100#

Claude Opus 4.1 - Accuracy Score: 70/100#

Claude Sonnet 4 - Accuracy Score: 65/100#

Analysis and Discussion#

Performance Ranking Summary#

Challenges Specific to Manuscript Paper#

Characteristic Behaviors by Model#

Google (Gemini)#

OpenAI (GPT-5, GPT-4.1)#

Anthropic (Claude)#

Practical Recommendations#

Recommended Models by Scenario#

Best Practices for Manuscript Paper OCR#

Summary#

Notes#

Introduction

Features of This Verification

Verification Overview

Image Used

Verification Conditions

Ground Truth Text

Evaluation Method

Detailed OCR Results

1st Place: Gemini 2.5 Pro - Accuracy Score: 98/100

2nd Place: GPT-5 - Accuracy Score: 97/100

3rd Place: GPT-4.1 - Accuracy Score: 92/100

Claude Opus 4.1 - Accuracy Score: 70/100

Claude Sonnet 4 - Accuracy Score: 65/100

Analysis and Discussion

Performance Ranking Summary

Challenges Specific to Manuscript Paper

Characteristic Behaviors by Model

Google (Gemini)

OpenAI (GPT-5, GPT-4.1)

Anthropic (Claude)

Practical Recommendations

Recommended Models by Scenario

Best Practices for Manuscript Paper OCR

Summary

Notes