Introduction
TEI (Text Encoding Initiative) is a widely used standard for digitizing humanities texts. This article introduces a case study of using the Processing Model feature introduced in TEI P5 to achieve conversion from TEI XML to multiple formats (HTML, LaTeX/PDF, EPUB3).
https://www.tei-c.org/Vault/P5/3.0.0/doc/tei-p5-doc/en/html/TD.html#TDPM
The target project uses texts published in the “Koui Genji Monogatari” (Collated Tale of Genji) as an example.
https://kouigenjimonogatari.github.io/
Background
Previously, conversion processes were performed individually, as introduced in the following articles.
Customization of ODD/RNG files to limit the tags used
Conversion to HTML using XSLT
Conversion to TeX/PDF using XSLT
Conversion to EPUB
In each of these efforts, separate files describing individual conversion rules needed to be created, and this complexity was a challenge.
What is Processing Model?
Processing Model is a mechanism for declaratively describing conversion rules for TEI elements. Previously, individual XSLT had to be written for each output format, but with Processing Model:
- Conversion rules can be defined within the ODD file
- Multiple output formats can be supported (web, latex, epub, etc.)
- Schema and conversion rules can be centrally managed
Structure of Processing Model
Key elements:
elementSpec/@ident: Target TEI element namemodelSequence/@output: Output mode (web, latex, epub, etc.)model/@behaviour: Conversion behavior (inline, block, paragraph, break, omit, etc.)outputRendition: Output element name or command
Implementation Architecture
This project adopted a two-layer architecture based on the principle of Separation of Concerns:
1. Processing Model Layer (Auto-generated)
Basic element conversion rules are auto-generated from the Processing Model definitions in the ODD file:
2. Wrapper Layer (Manually Created)
Implements format-specific functionality:
HTML Wrapper (
html_wrapper.xsl)- Integration of Mirador IIIF viewer
- JavaScript (page navigation, highlighting)
- Tailwind CSS styling
- Vertical text display
- Metadata modal
LaTeX Wrapper (
tex_wrapper.xsl)- ltjtarticle document class
- LuaLaTeX Japanese support
- Custom geometry
- Color command definitions
EPUB3 Generation Tool (
tei_to_epub.py)- EPUB structure file generation (container.xml, content.opf, nav.xhtml)
- Vertical text CSS
- ZIP packaging
Implementation Steps
Step 1: Add Processing Model Definitions to ODD
In the Koui Genji Monogatari project, Processing Models were defined for the following elements:
seg: Text segment (inline in HTML, paragraph in LaTeX)lb: Line break (<br/>in HTML, omitted in LaTeX)pb: Page break (inline marker in HTML, omitted in LaTeX)persName: Person name (<span>in HTML,\person{}command in LaTeX)placeName: Place name (<span>in HTML,\place{}command in LaTeX)body,div,p: Structural elements
Step 2: Create the XSLT Generation Tool
Developed a Python tool odd_to_xslt.py to auto-generate XSLT from Processing Model:
Usage:
Step 3: Create Wrapper XSLT
Import the generated XSLT and add format-specific functionality:
Step 4: Execute Conversion
Conversion to each format:
Output Results
Three formats were generated from a single TEI XML file (01.xml):
| Format | File Size | Features |
|---|---|---|
| HTML | 115KB | Mirador IIIF viewer integration, vertical text, interactive navigation |
| 201KB (8 pages) | LuaLaTeX Japanese typesetting, landscape layout, color display | |
| EPUB3 | 14KB | Vertical text e-book, XHTML5 compliant |
HTML


EPUB3

Benefits of the Implementation
1. Improved Maintainability
- Easy to modify Processing Model: Just edit the ODD and regenerate the XSLT
- Separation of element conversion and presentation: Basic conversion and interactive features are independent
- Centralized management: Schema and conversion rules are consolidated in the ODD
2. Reusability
- Reuse of basic conversion XSLT: Can be used in other projects
- Wrapper customization: Adapts to project-specific requirements
3. Declarative Description
- Readability: Processing Model is easier to understand than imperative XSLT
- Documentation:
<desc>explicitly states the intent of rules
4. Consistency
- Consistency across multiple formats: Generated from the same ODD
- Synchronization of schema and implementation: Definition and implementation stay in sync
Challenges and Solutions: Processing Model Execution Environment
Tools that can directly execute Processing Model, such as TEI Publisher, are limited.
In this effort, we developed a custom XSLT generation tool (odd_to_xslt.py) that generates XSLT skeletons from Processing Model.
Summary
By using TEI Processing Model:
- Declarative and maintainable conversion rules can be written
- Multiple formats (HTML, LaTeX/PDF, EPUB3) can be centrally managed
- Separation of concerns allows independent management of basic conversion and format-specific features
- High reusability makes it applicable to other TEI projects
In the Koui Genji Monogatari project, this approach achieved:
- Generation of 3 output formats from a single ODD file
- Interactive web viewer (Mirador integration)
- PDF (LuaLaTeX Japanese typesetting)
- E-book format (vertical text EPUB3)
References
- TEI Guidelines - Processing Model
- TEI Publisher - Processing Model execution environment
- Koui Genji Monogatari Project
- Project tools:
odd_to_xslt.py: Processing Model to XSLT conversion tooltei_to_epub.py: TEI to EPUB3 conversion tool
Source Code
All code introduced in this article is published in the following repository: