Introduction

In this article, we compared and verified the OCR performance of major LLM models using actual manuscript paper images. While many OCR benchmarks target printed documents and horizontally written text, we evaluate recognition accuracy on the special format of Japanese vertical manuscript paper to more practically verify each model’s Japanese document understanding capabilities.

Features of This Verification

  • Using the uniquely Japanese manuscript paper format: Verification with images containing complex elements such as characters placed in grid cells, vertical writing layout, and distinctive margin composition
  • Assuming practical use cases: Performance evaluation on manuscript paper used in actual writing scenarios such as essays, novels, and academic papers
  • Comprehensive comparison of the latest models: Comparison of the latest models – GPT-5, GPT-4.1, Gemini 2.5 Pro, Claude Opus 4.1, and Claude Sonnet 4 – under identical conditions

Verification Overview

Image Used

  • Image source: Canva template (400-character manuscript paper)
  • URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/
  • Image characteristics:
    • 20x20 grid, 400-character manuscript paper
    • Vertical writing layout
    • Faint grid lines (cells)
    • Distinction between title area and body area

Verification Conditions

  • Prompt used: “OCR this” (common across all models)
  • Parameters: Default settings for each model
  • Execution period: September 2025

Ground Truth Text

稿稿 使使使使

Evaluation Method

The accuracy scores in this article are subjective scores that comprehensively evaluate character recognition accuracy, layout understanding, and text structure preservation. From a practical perspective, we have quantified the strengths and challenges of each model in an easy-to-understand manner.

Detailed OCR Results

1st Place: Gemini 2.5 Pro - Accuracy Score: 98/100

稿稿使使使使

Evaluation Points:

  • Character recognition: Nearly perfect
  • Missing space in author name
  • Paragraph composition: Appropriate 2-paragraph structure
  • Layout preservation: Excellent

2nd Place: GPT-5 - Accuracy Score: 97/100

稿稿使使使使

Evaluation Points:

  • Character recognition: Perfect
  • Space present in author name
  • No paragraph separation (continuous as 1 paragraph)
  • Body text: Completely accurate

3rd Place: GPT-4.1 - Accuracy Score: 92/100

稿稿 使使使使

Evaluation Points:

  • Body text recognition: Perfect
  • Author name position: Incorrectly placed at the end
  • Space present in author name
  • Paragraph composition: Appropriate

Claude Opus 4.1 - Accuracy Score: 70/100

稿稿使使使使

Evaluation Points:

  • Text flow collapsed (cut off at “for essays etc.”)
  • Unnatural sentence beginning with “and short essays”
  • Missing “characters that fit the grid cells”

Claude Sonnet 4 - Accuracy Score: 65/100

稿使使使使

Evaluation Points:

  • Misrecognition of “somewhat”
  • Word order confusion (“essay please use”)
  • Missing opening portion
  • Second half: Accurate

Analysis and Discussion

Performance Ranking Summary

RankModelAccuracy ScoreStrengthsChallenges
1Gemini 2.5 Pro98/100Appropriate paragraph composition, character recognitionMissing space in author name
2GPT-597/100Perfect character recognitionNo paragraph separation
3GPT-4.192/100Paragraph composition, body text accuracyAuthor name position
4Claude Opus 4.170/100Basic recognitionText structure collapse
5Claude Sonnet 465/100Second half recognitionMissing opening, word order confusion

Challenges Specific to Manuscript Paper

  1. Impact of grid cells

    • Misrecognizing grid lines as part of characters
    • Forced interpretation of line breaks due to grid cells
  2. Complexity of vertical writing layout

    • Understanding column movement from right to left
    • Interpretation of whitespace between paragraphs
  3. Placement of meta-information

    • Positional relationship between title and author name
    • Distinction from body text

Characteristic Behaviors by Model

Google (Gemini)

  • Most accurately understood the manuscript paper format
  • Deep understanding of Japanese document structure

OpenAI (GPT-5, GPT-4.1)

  • Stable character recognition capabilities
  • Prioritized content accuracy over layout

Anthropic (Claude)

  • Struggled with vertical writing layout interpretation
  • Particularly in Opus 4.1, exhibited peculiar behavior of text duplication

Practical Recommendations

  1. Novels and creative manuscripts

    • Recommended: Gemini 2.5 Pro
    • Reason: Layout preservation and high accuracy
  2. Academic papers and reports

    • Recommended: GPT-5
    • Reason: Focus on content accuracy
  3. Batch processing / cost-focused

    • Recommended: GPT-4.1
    • Reason: Sufficient accuracy and balance

Best Practices for Manuscript Paper OCR

  1. Preprocessing
#---RRCCeeoocslnoootmlrrmuaetmsniotdodenead::djs3Gue0rst0attdymipsenicngatsol:re+h2i0g%her
  1. Prompt optimization
"RwTeihatidhsftirisotmlJeat,phaeanueutsphepoervrenrratimigech,atlatnmodantbuhosedcyrliotpwetexrtpalipenefrtt.,hatorder."
  1. Post-processing workflow

    • Cross-checking with multiple models
    • Automatic removal of duplicate portions
    • Verification of author name position

Summary

Through OCR verification on the uniquely Japanese manuscript paper format, clear performance differences between LLM models became apparent. Gemini 2.5 Pro and GPT-5 demonstrated high accuracy, showing strengths in paragraph composition and character recognition, respectively. Meanwhile, Claude-series models revealed challenges in understanding vertical writing layouts.

Notes

The results in this verification are based on the simple prompt “OCR this.” Performance of each model can be significantly improved through prompt engineering and parameter adjustments. For example, explicitly specifying that it is “vertical Japanese manuscript paper” or adjusting the temperature parameter may yield more accurate results.

The results in this article should be used only as a reference, and we recommend exploring optimal settings for your specific use case.


Verification date: September 2025 Image used: Canva template EAFbqUoH7P8 Prompt used: “OCR this”