Building a DOCX to TEI/XML Conversion Tool in the Browser Using the TEI Garage API

Introduction TEI (Text Encoding Initiative) is an international standard for digitally structuring texts in the humanities. It is used in libraries, museums, and academic research, but writing TEI/XML directly requires knowledge of markup, making the barrier to entry high. This is where conversion tools from Microsoft Word (.docx) to TEI/XML come in. A well-known example is TEI Garage (formerly OxGarage), but its multi-purpose nature makes the UI somewhat complex. This time, I created a simple browser-based tool specialized for DOCX to TEI/XML conversion. ...

March 1, 2026 · 4 min · Nakamura

Exporting Web Annotations via the Hypothes.is API and Converting to TEI/XML

Introduction Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML. This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML. Obtaining an API Key Log in to Hypothes.is Go to Developer settings Generate an API key with “Generate your API token” Save the obtained key in a .env file. ...

February 28, 2026 · 16 min · Nakamura

Trying "oitei" - An Automatic Conversion Tool from OpenITI mARkdown to TEI XML

Introduction In the OpenITI (Open Islamicate Texts Initiative) project, which handles historical texts from the Islamicate world, texts can be tagged using a lightweight notation called mARkdown instead of TEI/XML. While TEI/XML is a powerful international standard for structuring texts, it has problems with right-to-left (RTL) languages like Arabic, where mixing XML tags causes display issues in editors. mARkdown was designed to solve this problem. In this article, we will try running oitei, a Python tool that automatically converts mARkdown texts to TEI XML. ...

February 28, 2026 · 17 min · Nakamura

ODD Editing Tips: Part 1

Restricting an Element’s Attributes to Specific Ones Only By default in TEI, elements inherit many attribute classes (att.global, att.datable, etc.), making numerous attributes available. If you want to allow only specific attributes, configure it as follows. Example: Allowing Only xml:id and corresp on persName < e / l < < e e c / a / l m l < < < < < < < c t < < a e e a ! m m m m m m l t a / a / t m n s - e e e e e e a L t < < a t < < a t e t s - m m m m m m s i t d d / t t d d / t L n S e b b b b b b s s D e a < d t D e a < d t i t p s 属 e e e e e e e t e s t d a D e s t d a D s S e 性 r r r r r r s > f c a a t e f c a a t e t p c m ク O O O O O O > > t t a f > t t a f > e o ラ f f f f f f i 要 y a t > i 関 y a t > c i d ス d 素 p R y d 連 p R y > d e を k k k k k k e の e e p e す e e p e = 削 e e e e e e n 一 > f e n る > f e n " 除 y y y y y y t 意 > t 人 > t c ( = = = = = = = な n = 物 k = h モ " " " " " " " 識 a " 情 e " a デ a a a a a a x 別 m c 報 y p n ル t t t t t t m 子 e o へ = e g ク t t t t t t l < = r の " r e ラ . . . . . . : / " r リ t s " ス g c d e p t i d I e ン e N > は l m a d e y d e D s ク i a 維 o c t i r p " s " p < d m 持 b " a t s e c / " / a e ) a b L o d m > > d t " l m l i n " o m e a " o e k a d o s . m d " e l m e d c p o m e " " o = e > o d o = m d " = i e d " o m m e a " n = e d d o o = d a t " = e e d d " d d e c " l = e e d " d r h d e " = = e " " a e t d " " l u / n l e e d d e s u > g e " l e e t a s e t / e l l e g a " e > t e e " e g > " e t t / = e / " e e > " = > / " " o " > / / p o > > t p " t > " > Key Points Use <classes mode="change">: If you use mode="replace" and leave it empty, the model classes will also be deleted, making the element itself unusable Delete attribute classes individually: Remove unnecessary attribute classes with <memberOf key="att.xxx" mode="delete"/> Add required attributes: Define the attributes you want to allow with <attDef ident="xxx" mode="add"> Notes You can check which attribute classes an element belongs to in the TEI Guidelines Deleting att.global will also remove xml:id, xml:lang, etc., so add them individually as needed Adding Attributes to an Element When adding a new attribute while keeping existing attribute classes: ...

January 27, 2026 · 5 min · Nakamura

Constraint Design for IIIF-Compatible Facsimile Description Using TEI ODD

Introduction When describing metadata for digital images in TEI (Text Encoding Initiative), the facsimile element is used. Particularly in IIIF (International Image Interoperability Framework) compatible digital archives, it is important to properly describe references to manifests, canvases, and the Image API. This article introduces how to define the constraints needed for facsimile descriptions as a schema using ODD (One Document Does it all). Guidelines Followed This ODD is based on the “Linking with IIIF Images” specification introduced in the Japanese TEI guidelines: ...

December 10, 2025 · 15 min · Nakamura

Complete Restoration of Deep Zoom Images: Converting Tile Images to BigTIFF

Introduction Deep Zoom technology is used to smoothly zoom and display high-resolution images on websites. There are cases where you need to restore the original high-resolution image from tiled image data generated by tools such as Microsoft Deep Zoom Composer. This article explains the technology for restoring original high-resolution TIFF images from image data published in Deep Zoom format. How Deep Zoom Images Work Tile Structure Deep Zoom images divide a single large image into multiple small tile images and store them in a pyramid structure: ...

November 18, 2025 · 14 min · Nakamura

Declarative Multi-Format Conversion with TEI Processing Model

Introduction TEI (Text Encoding Initiative) is a widely used standard for digitizing humanities texts. This article introduces a case study of using the Processing Model feature introduced in TEI P5 to achieve conversion from TEI XML to multiple formats (HTML, LaTeX/PDF, EPUB3). https://www.tei-c.org/Vault/P5/3.0.0/doc/tei-p5-doc/en/html/TD.html#TDPM The target project uses texts published in the “Koui Genji Monogatari” (Collated Tale of Genji) as an example. https://kouigenjimonogatari.github.io/ Background Previously, conversion processes were performed individually, as introduced in the following articles. ...

October 8, 2025 · 19 min · Nakamura

Guide to Publishing TEI/XML Files on GitHub

Introduction This article explains the procedure for uploading TEI (Text Encoding Initiative) format XML files to GitHub and creating URLs that anyone can access. TEI/XML is an international standard format for structurally describing texts such as historical documents and literary works. By using GitHub, you can share your research data with researchers around the world. What You Need A computer (Windows, Mac, or Linux) Internet connection TEI/XML files (that you already have) Email address (for creating a GitHub account) About Sample Files If you don’t have TEI/XML files, you can use the following TEI/XML file from the Koui Genji Monogatari for practice: ...

September 6, 2025 · 7 min · Nakamura

Implementation Guide for TEI XML Schema Combining RELAX NG and Schematron

! After manual verification, an AI wrote this article. Introduction When editing TEI (Text Encoding Initiative) XML, in addition to structural validation of elements and attributes, more complex business rule validation may be needed. This article explains how to combine RELAX NG (RNG) and Schematron to achieve both structural and content validation, using challenges encountered in an actual project as examples. The Problem to Solve When editing classical Japanese literary texts in TEI XML, the following requirements arose: ...

August 9, 2025 · 20 min · Nakamura

Creating Project-Specific RNG Files Using Generative AI

Overview When editing TEI/XML files, changing the RNG file used for validation allows you to limit the tags and attributes available. This offers benefits such as preventing workers from being confused by tag choices and reducing inconsistencies in the created TEI/XML. As a method for editing RNG files, using Roma is common, as introduced in the following article. This is a top-down approach to limiting available tags and attributes, but this time we try creating an RNG file bottom-up from existing TEI/XML using generative AI. ...

August 1, 2025 · 37 min · Nakamura

Trying DToC: Dynamic Table of Contexts

Overview I had an opportunity to try DToC: Dynamic Table of Contexts, so this is a memorandum. https://www.leaf-vre.org/docs/features/dtoc The machine-translated description is as follows: It brings innovation to electronic reading by combining the power of semantic markup with book navigation features. The traditional overview functions of printed books – the table of contents and keyword index – are dynamically integrated with full-text search and tag-based indexing features, creating a new reading experience. ...

July 16, 2025 · 5 min · Nakamura

Fixing the 'ref' Bug in DHConvalidator

This article was partially written by AI. Overview DHConvalidator is a tool for converting Digital Humanities (DH) conference abstracts into a consistent TEI (Text Encoding Initiative) text base. https://github.com/ADHO/dhconvalidator When using this tool, the following error occurred during the conversion process from Microsoft Word format (DOCX) to TEI XML format: E R R O R : n u . x o m . P a r s i n g E x c e p t i o n : c v c - c o m p l e x - t y p e . 2 . 4 . a : I n v a l i d c o n t e n t w a s f o u n d s t a r t i n g w i t h e l e m e n t ' r e f ' This article shares the cause and solution for this issue. ...

June 27, 2025 · 23 min · Nakamura

Updating the DTS (Distributed Text Services) API for the Koui Genji Monogatari Text DB

Overview This is a memo about updating the DTS (Distributed Text Services) API for the Koui Genji Monogatari Text DB. Background The DTS (Distributed Text Services) API is described at the following link. https://distributed-text-services.github.io/specifications/ The following article introduced the creation of the DTS API. However, the following was noted as a remaining issue. Please note that the DTS API developed this time may have areas that do not comply with the above guidelines. ...

May 24, 2025 · 11 min · Nakamura

Improvements to the Polygon Annotation Support Tool for IIIF Images

Overview I made improvements to “IIIF Annotator,” a polygon annotation support tool for IIIF images. Specifically, I worked on the following three points: Support for manifest files that do not use an Image Server Export function for IIIF manifest files with annotations Export function for TEI/XML files The following sections explain these improvements. Background The following article explained the reasons for creating a new annotation tool. The features added this time are also available in other tools, but were implemented for improved convenience. ...

May 20, 2025 · 4 min · Nakamura

Developing a DTS (Distributed Text Services) Viewer

Overview I developed a viewer for DTS (Distributed Text Services), so this is a memo about it. You can try it at the following URL. https://dts-viewer.vercel.app/ja/ Background The official page for DTS (Distributed Text Services) is below. https://distributed-text-services.github.io/specifications/ I also covered it in the following article. This time, I developed a viewer that partially conforms to this DTS specification. Usage The following is the top page. Enter a DTS URL in the form. Examples are provided at the bottom of the page. Technically, it uses the Entry point. ...

May 11, 2025 · 2 min · Nakamura

Creating Polylines Using the Polygon Tool in Annotorious v2

Overview This is a memo on how to create polylines using the polygon tool in Annotorious v2. Background The Annotorious v2 website is available at the following link. https://annotorious.github.io/getting-started/ As shown below, polygons can be drawn. However, a tool for drawing polylines in a similar manner did not appear to be provided, including the following plugin. https://github.com/annotorious/annotorious-v2-selector-pack Customization When a polygon like the following is created: The following JSON file is generated: ...

May 5, 2025 · 9 min · Nakamura

Prototyping a TEI/XML File Creation App Using Google Cloud Vision API and GakuNin RDM

Overview I prototyped a TEI/XML file creation app using Google Cloud Vision API and GakuNin RDM, so this is a memo of that work. Background I needed an environment for creating TEI/XML files that reflect OCR results using Google Cloud Vision API. So I prototyped an environment that uses GakuNin RDM as the backend to manage files per user and execute OCR. How to Use Creating a Folder Access the following. ...

April 16, 2025 · 3 min · Nakamura

An Example of Representing IIIF Polygon Annotations in TEI/XML

Overview This article introduces an example of representing IIIF polygon annotations in TEI/XML. Method In TEI/XML, you can represent polygon annotations using the zone tag and the points attribute. https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-teidata.point.html Example For verification purposes, I added a TEI/XML export feature to the annotation tool introduced in the following article. Specifically, the following download option was added. An example of the TEI/XML obtained as a download result is shown below. Rectangles are described using ulx, uly, lrx, lry, while polygon information is described using points. ...

April 8, 2025 · 9 min · Nakamura

Scrolling to a Specific Element Using CETEIcean and XPath

Overview This is a memo on how to scroll to a specific element using CETEIcean and XPath. Demo You can try it at the following URL. https://next-ceteicean-router.vercel.app/xpath/ After accessing the page and scrolling, it is displayed as follows. Obtaining the XPath The above example targets the XML file from the “Koui Genji Monogatari Text DB.” https://kouigenjimonogatari.github.io/tei/01.xml The following XPath is specified. /TEI/text[1]/body[1]/p[1]/seg[267] To obtain this XPath, you can right-click on the target element in Oxygen XML Editor and select “Copy XPath.” ...

March 27, 2025 · 4 min · Nakamura

Application of DTS (Distributed Text Services) dts:wrapper When Building Search Systems from TEI/XML

Overview This is a note on the application of the DTS (Distributed Text Services) dts:wrapper tag when building search systems from TEI/XML. DTS (Distributed Text Services) is described as follows: Cayless, H., Clérice, T., Jonathan, R., Scott, I., & Almas, B. Distributed Text Services Specifications (Version 1-alpha) [Computer software]. https://github.com/distributed-text-services/specifications` References As an example of building DTS, the following may also be helpful. Example The following “Digital Engishiki” is used as an example. ...

March 15, 2025 · 14 min · Nakamura