Development of a Text Comparison Tool Using IIIF Manifests

Introduction

As digitization of classical texts progresses, there is a growing need to compare and analyze texts from different manuscripts and critical editions. This article introduces “Text Comparison Tool,” a web application that leverages IIIF (International Image Interoperability Framework) manifests to display and compare images and text from two materials side by side.

Demo site: https://iiif-text.vercel.app/

Background and Challenges

Classical texts published in digital archives sometimes have text annotations attached to their IIIF manifests. However, there are few convenient tools for comparing the text of two materials side by side.

For example, when comparing a critical edition and a manuscript of a work, the following tasks are necessary:

Placing images side by side for visual comparison
Checking text differences character by character
Quantitatively understanding the degree of similarity

We aimed to achieve all of these in a single tool.

Three Comparison Modes

This tool allows comparing materials in three modes.

1. Image Comparison

Using a high-resolution image viewer with OpenSeadragon, images of two materials are displayed side by side. It supports zoom, pan, rotation, and page navigation.

2. Text Diff

Text annotations contained in the IIIF manifest are extracted, and character-level differences are highlighted. Additions are shown in green and deletions in red with strikethrough.

3. Edit Distance (Levenshtein Distance)

Text similarity is calculated on a per-line basis using Levenshtein distance. Results are visualized as a network graph, where lines with high similarity are connected by edges. A threshold slider allows adjusting the minimum similarity for displayed edges.

Tech Stack

Category	Technology
Framework	Next.js (App Router / Static Export)
Language	TypeScript
Styling	Tailwind CSS v4
UI Components	Radix UI
Image Viewer	OpenSeadragon
Network Visualization	vis-network
State Management	Zustand
Internationalization	next-intl (Japanese / English)
Diff Detection	diff

Architecture

Data Flow

The fetchManifest() function parses IIIF Presentation API v3 manifests, extracting image URLs and text annotations from each canvas. The extracted data is stored in the Zustand store, and each component reactively references it.

Levenshtein Distance Calculation

A custom Levenshtein distance algorithm compares all lines of the left and right texts in a brute-force manner. The distance is normalized by edit_count / max(string_length_1, string_length_2), converting to a scale of 0 to 1. For performance, only the top 10 edges by similarity are displayed in the network graph.

Comparison results are maintained as URL parameters, so the same comparison screen can be reproduced simply by sharing the URL.

Parameter	Description
`manifest1` / `manifest2`	IIIF manifest URLs
`canvas1` / `canvas2`	Specific canvas URLs (optional)
`label1` / `label2`	Display labels (optional)
`mode`	Display mode: `0`=image, `1`=diff, `2`=edit distance
`embed`	`1` for embed mode

Specifying embed=1 results in a compact display without the header, suitable for embedding in iframes.

Deployment as a Static Site

Using Next.js’s output: "export", the tool is output as a fully static site. Since no server-side processing is required, it can be deployed to any static hosting service such as Vercel, Netlify, or GitHub Pages. With basePath configuration, it also supports placement in subdirectories.

Usage Example: Comparing Tale of Genji Manuscripts

The tool’s demo compares the following two materials:

Left: Koui Genji Monogatari (Kiritsubo)
Right: Early modern period horizontal manuscript (held by the National Diet Library)

In text diff mode, you can visually confirm textual differences between the critical edition and the manuscript, and in edit distance mode, you can get an overview of which lines correspond to each other through a network graph.

Conclusion

This tool can compare any two materials that have text annotations attached to their IIIF manifests. It is intended for use in various scenarios, including classical text research in digital archives and verification of OCR results.

The source code is published on GitHub. We welcome feedback and feature requests.

Introduction#

Background and Challenges#

Three Comparison Modes#

1. Image Comparison#

2. Text Diff#

3. Edit Distance (Levenshtein Distance)#

Tech Stack#

Architecture#

Data Flow#

Levenshtein Distance Calculation#

Sharing and Embedding via URL Parameters#

Deployment as a Static Site#

Usage Example: Comparing Tale of Genji Manuscripts#

Conclusion#