Introduction
When digitizing East Asian classical texts, it has become common to mark them up in XML following TEI (Text Encoding Initiative) guidelines. The “TEI Classical Text Viewer” developed by the International Institute of Humanistic Research is a convenient tool that can easily display such TEI/XML files in a browser.
- Official site: https://tei.dhii.jp/teiviewer4eaj
- Web version: https://candra.dhii.jp/nagasaki/tei/tei_viewer/
This time, I customized this viewer to support displaying <gap> tags that indicate illegible sections. This article introduces the customization method.
Challenge: gap Tags Not Displayed
In digitizing classical texts, sections that cannot be read due to worm damage or deterioration are marked up with <gap> tags.
However, the standard TEI Classical Text Viewer does not display this tag appropriately. So I customized it to display black squares corresponding to the number of illegible characters, with the reason shown on mouse hover.
Customization Approach
The TEI Classical Text Viewer has the following file structure.
Directly editing app.min.js would cause changes to be lost when the core is updated. Therefore, I achieved the customization by editing only app_conf.js, maintaining compatibility with the core.
Implementation
1. DOM Monitoring with MutationObserver
The TEI Classical Text Viewer parses XML and converts it to DOM. To process <gap> tags after this conversion, MutationObserver is used to monitor DOM changes.
2. Processing gap Tags
When a <gap> tag is detected, black squares are displayed according to the quantity attribute value, and the reason attribute is set as a tooltip.
Key Point: Attribute Access Method
When the TEI Classical Text Viewer converts XML to HTML, how attributes are handled varies by element. For <gap> tags, XML attributes are preserved as-is, so they can be retrieved directly with getAttribute().
Checking the actual DOM structure with browser developer tools is important.
Additional Customizations
Using the same approach, the following features were also added.
Height Specification via GET Parameters
Made it possible to specify the height of the text display area via URL parameters.
Setting the Page Title
When there are multiple <title> elements in the TEI/XML, the first title is set as the page title (by default, the last title is used).
Improved Display of Bibliographic Information (sourceDesc/bibl)
To format <bibl> elements within <sourceDesc> for better readability, CSS was used to add labels to each element and display them as blocks.
This produces a readable display like the following:
Summary
The TEI Classical Text Viewer can be flexibly customized by editing the configuration file app_conf.js. The MutationObserver approach introduced here can also be applied to handle other TEI tags.
The customized code is published in the following repository.
Reference Links
- TEI Classical Text Viewer official site: https://tei.dhii.jp/teiviewer4eaj
- TEI Guidelines - gap element: https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-gap.html
- Japanese TEI Guidelines: https://tei.dhii.jp/
Acknowledgments
I would like to express my gratitude to Dr. Kiyonori Nagasaki (International Institute of Humanistic Research) and Mr. Atsushi Honma (Felix Style) for developing and publishing the TEI Classical Text Viewer.