TEI ODD File Customization: A Case Study with NDL Classical Book OCR

Introduction TEI (Text Encoding Initiative) is an international standard for digitizing and sharing texts in humanities research. This article introduces the process of customizing a TEI ODD file to match the output format of the NDL Classical Book OCR-Lite application. ODD (One Document Does it all) is a mechanism for customizing TEI schemas, allowing you to define your own schema containing only the necessary elements and attributes. Background: Development of the NDL Classical Book OCR-Lite Application I am creating an application that outputs NDL Classical Book OCR-Lite results in TEI/XML. The purpose of this application is to perform OCR on Japanese classical books and output the results in standard TEI format. ...

September 5, 2025 · 22 min · Nakamura

Converting ODD to RNG/HTML Using the TEI Garage API

Introduction Generating schemas (RNG) and documentation (HTML) from TEI (Text Encoding Initiative) ODD (One Document Does it all) files is an important process in TEI projects. This article analyzes the mechanism of the TEI Garage API used internally by Roma (the TEI ODD editor) and introduces how to call the API directly from scripts to convert ODD files. What is TEI Garage? TEI Garage is a web service provided by the TEI community that can perform conversions between various formats. It provides the following capabilities for processing ODD files in particular: ...

September 3, 2025 · 33 min · Nakamura

Development of the NDL Kotenseki OCR-Lite Next.js Version

Overview @yuta1984 developed a “WebAssembly-based web port of NDL Kotenseki OCR-Lite”: https://github.com/yuta1984/ndlkotenocr-lite-web Using the above repository as a reference, I created a Next.js version: https://nkol.vercel.app/ja/ In addition, the following features have been added: IIIF manifest file input form TEI/XML file download functionality Creation of an ODD file for the output format Usage As an example, we use the Tale of Genji from the Kyushu University Library: https://catalog.lib.kyushu-u.ac.jp/image/manifest/1/820/411193.json After entering the manifest file and clicking the “Load” button, a list of images is displayed as shown below: ...

September 1, 2025 · 3 min · Nakamura

Using Roma to Restrict Allowed Values for Tag Attributes

Overview This is a memo on how to restrict the allowed values for tag attributes using Roma. Background In the following article, I described how to restrict the attributes available for a tag. For example, making only the key and type attributes available for the persName tag. In this article, I go further to restrict the allowed values for specific attributes. For example, allowing only “right marginal note” or “left marginal note” to be set for the type attribute. ...

October 28, 2024 · 1 min · Nakamura

Using Roma to Restrict Attributes for Tags According to Your Project

Overview This is a personal note on how to restrict attributes used for tags according to your project using Roma. Background In the following article, I described how to restrict tags according to your project using Roma. This time, as an extension of that, we will customize the attributes used for each tag. Use Case Here, as an example, we will try restricting the available attributes for persName. When using the default (tei_all.rng) with Oxygen XML Editor, as shown below, many options are presented as available attributes for the persName tag. ...

October 28, 2024 · 7 min · Nakamura

LEAF Writer: Customizing Schemas

Overview This is an investigation record on how to customize LEAF Writer. https://gitlab.com/calincs/cwrc/leaf-writer/leaf-writer This time, it is a memo on how to customize schemas. The goal is to display Japanese translations and other customizations as shown below. Below is the display before customization. Based on the following schema, many elements are displayed with English descriptions. https://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng Method Specify the schema file as follows. https://github.com/kouigenjimonogatari/kouigenjimonogatari.github.io/blob/master/xml/lw/01.xml Specifically: < ? x m l - m o d e l h r e f = " h t t p s : / / k o u i g e n j i m o n o g a t a r i . g i t h u b . i o / l w / t e i _ g e n j i . r n g " t y p e = " a p p l i c a t i o n / x m l " s c h e m a t y p e n s = " h t t p : / / r e l a x n g . o r g / n s / s t r u c t u r e / 1 . 0 " ? > LEAF Writer reads this schema file and uses it for validation and presenting available elements. ...

June 29, 2024 · 3 min · Nakamura

Schemas Convertible from TEI ODD: RNG, XSD, DTD, and More

Overview In the following article, I tried creating an ODD. The above uses a tool called Roma, and you can see that the created ODD has the following output formats available. Specifically, the available formats are “RELAX NG Schema,” “RELAX NG Compact,” “W3C Schema,” “Document Type Definition,” and “ISO Schematron Constraints.” I asked GPT-4 about the differences between these formats and am sharing the results here. There may be some inaccuracies, but I hope this serves as a useful reference. ...

November 4, 2023 · 5 min · Nakamura

Using Roma to Limit Tags for Your Project and Generate Documentation

Overview I previously explained how to use Roma in the following article. This time, I will explain the workflow for creating TEI ODD (One Document Does-it-all) and documentation (HTML and PDF) targeting TEI/XML files at hand. Note that at the end of this article, I have included GPT-4’s response regarding the differences between ODD (One Document Does it all) and RNG (RelaxNG). Please refer to that as well. Obtaining a List of Tags Used First, obtain a list of tags used in your project. ...

November 3, 2023 · 12 min · Nakamura