Introduction
As digitization of classical texts progresses, there is a growing need to compare and analyze texts from different manuscripts and critical editions. This article introduces “Text Comparison Tool,” a web application that leverages IIIF (International Image Interoperability Framework) manifests to display and compare images and text from two materials side by side.
Demo site: https://iiif-text.vercel.app/
Background and Challenges
Classical texts published in digital archives sometimes have text annotations attached to their IIIF manifests. However, there are few convenient tools for comparing the text of two materials side by side.
For example, when comparing a critical edition and a manuscript of a work, the following tasks are necessary:
- Placing images side by side for visual comparison
- Checking text differences character by character
- Quantitatively understanding the degree of similarity
We aimed to achieve all of these in a single tool.
Three Comparison Modes
This tool allows comparing materials in three modes.
1. Image Comparison
Using a high-resolution image viewer with OpenSeadragon, images of two materials are displayed side by side. It supports zoom, pan, rotation, and page navigation.
2. Text Diff
Text annotations contained in the IIIF manifest are extracted, and character-level differences are highlighted. Additions are shown in green and deletions in red with strikethrough.
3. Edit Distance (Levenshtein Distance)
Text similarity is calculated on a per-line basis using Levenshtein distance. Results are visualized as a network graph, where lines with high similarity are connected by edges. A threshold slider allows adjusting the minimum similarity for displayed edges.
Tech Stack
| Category | Technology |
|---|---|
| Framework | Next.js (App Router / Static Export) |
| Language | TypeScript |
| Styling | Tailwind CSS v4 |
| UI Components | Radix UI |
| Image Viewer | OpenSeadragon |
| Network Visualization | vis-network |
| State Management | Zustand |
| Internationalization | next-intl (Japanese / English) |
| Diff Detection | diff |
Architecture
Data Flow
The fetchManifest() function parses IIIF Presentation API v3 manifests, extracting image URLs and text annotations from each canvas. The extracted data is stored in the Zustand store, and each component reactively references it.
Levenshtein Distance Calculation
A custom Levenshtein distance algorithm compares all lines of the left and right texts in a brute-force manner. The distance is normalized by edit_count / max(string_length_1, string_length_2), converting to a scale of 0 to 1. For performance, only the top 10 edges by similarity are displayed in the network graph.
Sharing and Embedding via URL Parameters
Comparison results are maintained as URL parameters, so the same comparison screen can be reproduced simply by sharing the URL.
| Parameter | Description |
|---|---|
manifest1 / manifest2 | IIIF manifest URLs |
canvas1 / canvas2 | Specific canvas URLs (optional) |
label1 / label2 | Display labels (optional) |
mode | Display mode: 0=image, 1=diff, 2=edit distance |
embed | 1 for embed mode |
Specifying embed=1 results in a compact display without the header, suitable for embedding in iframes.
Deployment as a Static Site
Using Next.js’s output: "export", the tool is output as a fully static site. Since no server-side processing is required, it can be deployed to any static hosting service such as Vercel, Netlify, or GitHub Pages. With basePath configuration, it also supports placement in subdirectories.
Usage Example: Comparing Tale of Genji Manuscripts
The tool’s demo compares the following two materials:
- Left: Koui Genji Monogatari (Kiritsubo)
- Right: Early modern period horizontal manuscript (held by the National Diet Library)
In text diff mode, you can visually confirm textual differences between the critical edition and the manuscript, and in edit distance mode, you can get an overview of which lines correspond to each other through a network graph.
Conclusion
This tool can compare any two materials that have text annotations attached to their IIIF manifests. It is intended for use in various scenarios, including classical text research in digital archives and verification of OCR results.
The source code is published on GitHub. We welcome feedback and feature requests.