Building a DOCX to TEI/XML Conversion Tool in the Browser Using the TEI Garage API

Introduction TEI (Text Encoding Initiative) is an international standard for digitally structuring texts in the humanities. It is used in libraries, museums, and academic research, but writing TEI/XML directly requires knowledge of markup, making the barrier to entry high. This is where conversion tools from Microsoft Word (.docx) to TEI/XML come in. A well-known example is TEI Garage (formerly OxGarage), but its multi-purpose nature makes the UI somewhat complex. This time, I created a simple browser-based tool specialized for DOCX to TEI/XML conversion. ...

March 1, 2026 · 4 min · Nakamura

Exporting Web Annotations via the Hypothes.is API and Converting to TEI/XML

Introduction Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML. This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML. Obtaining an API Key Log in to Hypothes.is Go to Developer settings Generate an API key with “Generate your API token” Save the obtained key in a .env file. ...

February 28, 2026 · 16 min · Nakamura

Trying "oitei" - An Automatic Conversion Tool from OpenITI mARkdown to TEI XML

Introduction In the OpenITI (Open Islamicate Texts Initiative) project, which handles historical texts from the Islamicate world, texts can be tagged using a lightweight notation called mARkdown instead of TEI/XML. While TEI/XML is a powerful international standard for structuring texts, it has problems with right-to-left (RTL) languages like Arabic, where mixing XML tags causes display issues in editors. mARkdown was designed to solve this problem. In this article, we will try running oitei, a Python tool that automatically converts mARkdown texts to TEI XML. ...

February 28, 2026 · 17 min · Nakamura

ODD Editing Tips: Part 1

Restricting an Element’s Attributes to Specific Ones Only By default in TEI, elements inherit many attribute classes (att.global, att.datable, etc.), making numerous attributes available. If you want to allow only specific attributes, configure it as follows. Example: Allowing Only xml:id and corresp on persName < e / l < < e e c / a / l m l < < < < < < < c t < < a e e a ! m m m m m m l t a / a / t m n s - e e e e e e a L t < < a t < < a t e t s - m m m m m m s i t d d / t t d d / t L n S e b b b b b b s s D e a < d t D e a < d t i t p s 属 e e e e e e e t e s t d a D e s t d a D s S e 性 r r r r r r s > f c a a t e f c a a t e t p c m ク O O O O O O > > t t a f > t t a f > e o ラ f f f f f f i 要 y a t > i 関 y a t > c i d ス d 素 p R y d 連 p R y > d e を k k k k k k e の e e p e す e e p e = 削 e e e e e e n 一 > f e n る > f e n " 除 y y y y y y t 意 > t 人 > t c ( = = = = = = = な n = 物 k = h モ " " " " " " " 識 a " 情 e " a デ a a a a a a x 別 m c 報 y p n ル t t t t t t m 子 e o へ = e g ク t t t t t t l < = r の " r e ラ . . . . . . : / " r リ t s " ス g c d e p t i d I e ン e N > は l m a d e y d e D s ク i a 維 o c t i r p " s " p < d m 持 b " a t s e c / " / a e ) a b L o d m > > d t " l m l i n " o m e a " o e k a d o s . m d " e l m e d c p o m e " " o = e > o d o = m d " = i e d " o m m e a " n = e d d o o = d a t " = e e d d " d d e c " l = e e d " d r h d e " = = e " " a e t d " " l u / n l e e d d e s u > g e " l e e t a s e t / e l l e g a " e > t e e " e g > " e t t / = e / " e e > " = > / " " o " > / / p o > > t p " t > " > Key Points Use <classes mode="change">: If you use mode="replace" and leave it empty, the model classes will also be deleted, making the element itself unusable Delete attribute classes individually: Remove unnecessary attribute classes with <memberOf key="att.xxx" mode="delete"/> Add required attributes: Define the attributes you want to allow with <attDef ident="xxx" mode="add"> Notes You can check which attribute classes an element belongs to in the TEI Guidelines Deleting att.global will also remove xml:id, xml:lang, etc., so add them individually as needed Adding Attributes to an Element When adding a new attribute while keeping existing attribute classes: ...

January 27, 2026 · 5 min · Nakamura

Constraint Design for IIIF-Compatible Facsimile Description Using TEI ODD

Introduction When describing metadata for digital images in TEI (Text Encoding Initiative), the facsimile element is used. Particularly in IIIF (International Image Interoperability Framework) compatible digital archives, it is important to properly describe references to manifests, canvases, and the Image API. This article introduces how to define the constraints needed for facsimile descriptions as a schema using ODD (One Document Does it all). Guidelines Followed This ODD is based on the “Linking with IIIF Images” specification introduced in the Japanese TEI guidelines: ...

December 10, 2025 · 15 min · Nakamura

ODD Chain Tutorial

This is a tutorial for learning how to customize schemas using the “chain” feature of TEI ODD. What is ODD Chain There are two approaches to ODD chaining: 1. Inheritance (Vertical Chain) References a parent ODD using the source attribute to inherit customizations. T E I _ a l l → B a s e O D D → D e r i v e d O D D → F u r t h e r d e r i v a t i o n . . . 2. Combination (Horizontal Chain) Uses specGrp and specGrpRef to integrate multiple ODDs. ...

December 9, 2025 · 12 min · Nakamura

Customizing the TEI Classical Text Viewer to Display Illegible Sections (gap)

Introduction When digitizing East Asian classical texts, it has become common to mark them up in XML following TEI (Text Encoding Initiative) guidelines. The “TEI Classical Text Viewer” developed by the International Institute of Humanistic Research is a convenient tool that can easily display such TEI/XML files in a browser. Official site: https://tei.dhii.jp/teiviewer4eaj Web version: https://candra.dhii.jp/nagasaki/tei/tei_viewer/ This time, I customized this viewer to support displaying <gap> tags that indicate illegible sections. This article introduces the customization method. ...

December 9, 2025 · 14 min · Nakamura

Declarative Multi-Format Conversion with TEI Processing Model

Introduction TEI (Text Encoding Initiative) is a widely used standard for digitizing humanities texts. This article introduces a case study of using the Processing Model feature introduced in TEI P5 to achieve conversion from TEI XML to multiple formats (HTML, LaTeX/PDF, EPUB3). https://www.tei-c.org/Vault/P5/3.0.0/doc/tei-p5-doc/en/html/TD.html#TDPM The target project uses texts published in the “Koui Genji Monogatari” (Collated Tale of Genji) as an example. https://kouigenjimonogatari.github.io/ Background Previously, conversion processes were performed individually, as introduced in the following articles. ...

October 8, 2025 · 19 min · Nakamura

Guide to Publishing TEI/XML Files on GitHub

Introduction This article explains the procedure for uploading TEI (Text Encoding Initiative) format XML files to GitHub and creating URLs that anyone can access. TEI/XML is an international standard format for structurally describing texts such as historical documents and literary works. By using GitHub, you can share your research data with researchers around the world. What You Need A computer (Windows, Mac, or Linux) Internet connection TEI/XML files (that you already have) Email address (for creating a GitHub account) About Sample Files If you don’t have TEI/XML files, you can use the following TEI/XML file from the Koui Genji Monogatari for practice: ...

September 6, 2025 · 7 min · Nakamura

TEI ODD File Customization: A Case Study with NDL Classical Book OCR

Introduction TEI (Text Encoding Initiative) is an international standard for digitizing and sharing texts in humanities research. This article introduces the process of customizing a TEI ODD file to match the output format of the NDL Classical Book OCR-Lite application. ODD (One Document Does it all) is a mechanism for customizing TEI schemas, allowing you to define your own schema containing only the necessary elements and attributes. Background: Development of the NDL Classical Book OCR-Lite Application I am creating an application that outputs NDL Classical Book OCR-Lite results in TEI/XML. The purpose of this application is to perform OCR on Japanese classical books and output the results in standard TEI format. ...

September 5, 2025 · 22 min · Nakamura

Converting ODD to RNG/HTML Using the TEI Garage API

Introduction Generating schemas (RNG) and documentation (HTML) from TEI (Text Encoding Initiative) ODD (One Document Does it all) files is an important process in TEI projects. This article analyzes the mechanism of the TEI Garage API used internally by Roma (the TEI ODD editor) and introduces how to call the API directly from scripts to convert ODD files. What is TEI Garage? TEI Garage is a web service provided by the TEI community that can perform conversions between various formats. It provides the following capabilities for processing ODD files in particular: ...

September 3, 2025 · 33 min · Nakamura

Implementation Guide for TEI XML Schema Combining RELAX NG and Schematron

! After manual verification, an AI wrote this article. Introduction When editing TEI (Text Encoding Initiative) XML, in addition to structural validation of elements and attributes, more complex business rule validation may be needed. This article explains how to combine RELAX NG (RNG) and Schematron to achieve both structural and content validation, using challenges encountered in an actual project as examples. The Problem to Solve When editing classical Japanese literary texts in TEI XML, the following requirements arose: ...

August 9, 2025 · 20 min · Nakamura

Creating Project-Specific RNG Files Using Generative AI

Overview When editing TEI/XML files, changing the RNG file used for validation allows you to limit the tags and attributes available. This offers benefits such as preventing workers from being confused by tag choices and reducing inconsistencies in the created TEI/XML. As a method for editing RNG files, using Roma is common, as introduced in the following article. This is a top-down approach to limiting available tags and attributes, but this time we try creating an RNG file bottom-up from existing TEI/XML using generative AI. ...

August 1, 2025 · 37 min · Nakamura

Trying Out the Viewer from the "Pre-modern Japan-Asia Relations Digital Archive"

Overview The “Pre-modern Japan-Asia Relations Digital Archive” was released on July 25, 2025. https://asia-da.lit.kyushu-u.ac.jp/ The viewer is also available at: https://github.com/localmedialabs/tei_comparative_viewer In this article, I share my experience trying out this viewer. As a result, I was able to self-host it as shown below: https://tei-comparative-viewer.aws.ldas.jp/ It loads the following XML file of “Kaitoshokokki” (Record of Countries and Peoples in the Eastern Sea): https://asia-da.lit.kyushu-u.ac.jp/viewer/300 Running Locally Detailed instructions are provided at the following link, which I followed to get it running: ...

July 29, 2025 · 33 min · Nakamura

Trying Odeuropa-Related Tools

Overview I had the opportunity to try tools related to Odeuropa, so this is a memo about that experience. What is Odeuropa? An explanation is available on the following page. https://odeuropa.eu/ Below is a machine-translated description. Odeuropa is an innovative EU-funded project that studies Europe’s “olfactory cultural heritage.” Project objectives: To investigate and document the role that smells have played in European culture from 1600 to 1920. Using cutting-edge AI technology, it extracts smell-related information from approximately 43,000 images and 167,000 historical texts (in English, Italian, French, Dutch, German, and Slovenian). ...

July 24, 2025 · 19 min · Nakamura

Trying DToC: Dynamic Table of Contexts

Overview I had an opportunity to try DToC: Dynamic Table of Contexts, so this is a memorandum. https://www.leaf-vre.org/docs/features/dtoc The machine-translated description is as follows: It brings innovation to electronic reading by combining the power of semantic markup with book navigation features. The traditional overview functions of printed books – the table of contents and keyword index – are dynamically integrated with full-text search and tag-based indexing features, creating a new reading experience. ...

July 16, 2025 · 5 min · Nakamura

Fixing the 'ref' Bug in DHConvalidator

This article was partially written by AI. Overview DHConvalidator is a tool for converting Digital Humanities (DH) conference abstracts into a consistent TEI (Text Encoding Initiative) text base. https://github.com/ADHO/dhconvalidator When using this tool, the following error occurred during the conversion process from Microsoft Word format (DOCX) to TEI XML format: E R R O R : n u . x o m . P a r s i n g E x c e p t i o n : c v c - c o m p l e x - t y p e . 2 . 4 . a : I n v a l i d c o n t e n t w a s f o u n d s t a r t i n g w i t h e l e m e n t ' r e f ' This article shares the cause and solution for this issue. ...

June 27, 2025 · 23 min · Nakamura

Building an API Server for Searching the Koui Genji Monogatari Text DB

Overview This is a personal note on building an API server for searching the Koui Genji Monogatari (Collated Tale of Genji) Text DB. https://genji-api.aws.ldas.jp/ Background The following page publishes text data of “Koui Genji Monogatari” in TEI/XML compliant format. https://kouigenjimonogatari.github.io/ I created an API that registers this text data in Elasticsearch and enables searching by section (koma). Usage The following URL provides access to the documentation page using OpenAPI and Swagger. ...

June 25, 2025 · 24 min · Nakamura

Creating TEI/XML Files from IIIF Manifest Files Using NDL Kotenseki OCR-Lite

Overview This article introduces a Gradio app that creates TEI/XML files from IIIF manifest files using NDL Kotenseki OCR-Lite. It can be accessed at the following URL: https://nakamura196-ndlkotenocr-lite-iiif.hf.space/ Background This is a continuation of the following articles: Previously, two separate apps were needed, but with this update, the entire conversion process can be completed within a single Gradio app. Additionally, issues such as difficulty tracking progress when processing manifest files with many image pages, and the inability to copy processing results, have been fixed. ...

June 12, 2025 · 1 min · Nakamura

Part 2: Creating Annotated IIIF Manifest Files and TEI/XML Files Using NDL Classical Book OCR-Lite

Overview In the following article, I introduced how to create annotated IIIF manifest files and TEI/XML files using NDL Classical Book OCR-Lite. Since the explanation above was insufficient in many areas, I will re-introduce how to use it. Supplement Along with writing this article, the following improvements were made. Process 1: Creating IIIF Manifest Files Added support for IIIF Presentation API v3. Process 2: Creating TEI/XML Files Added a form that accepts string input, considering the connection with Process 1. Usage Process 1: Creating IIIF Manifest Files Access the following. ...

June 6, 2025 · 2 min · Nakamura