Introduction

TEI (Text Encoding Initiative) is a widely used standard for digitizing humanities texts. This article introduces a case study of using the Processing Model feature introduced in TEI P5 to achieve conversion from TEI XML to multiple formats (HTML, LaTeX/PDF, EPUB3).

https://www.tei-c.org/Vault/P5/3.0.0/doc/tei-p5-doc/en/html/TD.html#TDPM

The target project uses texts published in the “Koui Genji Monogatari” (Collated Tale of Genji) as an example.

https://kouigenjimonogatari.github.io/

Background

Previously, conversion processes were performed individually, as introduced in the following articles.

Customization of ODD/RNG files to limit the tags used

Conversion to HTML using XSLT

Conversion to TeX/PDF using XSLT

Conversion to EPUB

In each of these efforts, separate files describing individual conversion rules needed to be created, and this complexity was a challenge.

What is Processing Model?

Processing Model is a mechanism for declaratively describing conversion rules for TEI elements. Previously, individual XSLT had to be written for each output format, but with Processing Model:

  • Conversion rules can be defined within the ODD file
  • Multiple output formats can be supported (web, latex, epub, etc.)
  • Schema and conversion rules can be centrally managed

Structure of Processing Model

<e/l<<eedm/lmeo<<<<<<meesd!m/!m/!m/omnce-o<m-o<m-o<mdet>l-dm/o-dm/o-dm/oenSP>eo<mdeo<mdeo<mdltpeHldodoeEldodoeLldodoe>SerTSeuedlPSeuedlaSeuedlpcsMeltseSUeltseSTeltseSeoLqpcleBqpcleeqpclecinubu>>q3ubu>>qXubu>>q>daoeetIueetIueetCuelunhRneonhRneonhRuentcaelnucaelnucaesntnpevnictevnictevntc=auidnepidnepidoe"mtooie>uooie>uooim>peuuttuuttuute<tristristriLr/p=opp=opp=oasdu"nau"nau"nTNeti>nti>nti>eas=ns=ns=n\Xmc"lpf"lpf"lpe>wiaoeiaoliec"ennrpnnranrobe<ue<tesmm""/pb"/pe"omo>>oe">oex>nadur>ur"<netsts>/d=popoo"ununufctttohRnRnpraeaeaunnmnmtpgdedeReei<ier"t/tins>idindooeoinnsnEt>c>Pin>UoaBnm3>e<s/<d/edsecs>c>

Key elements:

  • elementSpec/@ident: Target TEI element name
  • modelSequence/@output: Output mode (web, latex, epub, etc.)
  • model/@behaviour: Conversion behavior (inline, block, paragraph, break, omit, etc.)
  • outputRendition: Output element name or command

Implementation Architecture

This project adopted a two-layer architecture based on the principle of Separation of Concerns:

1. Processing Model Layer (Auto-generated)

Basic element conversion rules are auto-generated from the Processing Model definitions in the ODD file:

otttdeeediii____weeeillltoeoeoehdmdmdm_dededep_n_n_nmtttttt.osososo______dxhxlxedstsasplmltlu(tltetbP...x..rpxp.pxoysyxysclslels((soBo(oBiuauBuantstatsgpipspiucuiucMttcto-H--EdmTmLmPeoMoaoUldLdTdBeeee3dcXewolecfenacpoibvtounn)eenbvirxv)ets)eriirsoosinnios)on)n))

2. Wrapper Layer (Manually Created)

Implements format-specific functionality:

  • HTML Wrapper (html_wrapper.xsl)

    • Integration of Mirador IIIF viewer
    • JavaScript (page navigation, highlighting)
    • Tailwind CSS styling
    • Vertical text display
    • Metadata modal
  • LaTeX Wrapper (tex_wrapper.xsl)

    • ltjtarticle document class
    • LuaLaTeX Japanese support
    • Custom geometry
    • Color command definitions
  • EPUB3 Generation Tool (tei_to_epub.py)

    • EPUB structure file generation (container.xml, content.opf, nav.xhtml)
    • Vertical text CSS
    • ZIP packaging

Implementation Steps

Step 1: Add Processing Model Definitions to ODD

<<!e/-l<<e-edm/lmeo<<<meEesdm/m/m/omxnceo<mo<mo<mdeat>ldm/odm/odm/oenmST>eo<mdeo<mdeo<mdltppelddoelddoelddoe>SlexSeedlSeedlSeedlpectelseSelseSelseSeqcleqcleqclecfisub>>qub>>qub>>q>odeeeIueeIueePuregnhnenhnenhaenmcalncalncarnsteevicevicevace=nineineigeg"tooe>ooe>oor>suuuuuuaeewtrstrstrplgip=pp=pp=he"tu"au"au"mhtintintpwem=n=n=ainoo"lw"lf"rttdpwiieiolahetentpnrag=ibehuetrm"o""b"Eeaecn>>d">Pxpdhaa>U"hialtB>"una3>mgc<eoa/s"rtdk>rteierspsic<pb>/oudntedesescn>cfeorliJnakv<a/Sdcersicp>tprocessing</desc>

In the Koui Genji Monogatari project, Processing Models were defined for the following elements:

  • seg: Text segment (inline in HTML, paragraph in LaTeX)
  • lb: Line break (<br/> in HTML, omitted in LaTeX)
  • pb: Page break (inline marker in HTML, omitted in LaTeX)
  • persName: Person name (<span> in HTML, \person{} command in LaTeX)
  • placeName: Place name (<span> in HTML, \place{} command in LaTeX)
  • body, div, p: Structural elements

Step 2: Create the XSLT Generation Tool

Developed a Python tool odd_to_xslt.py to auto-generate XSLT from Processing Model:

ccccllllaaaasssss"@d@d#s"#s"#s"#"aeae"""X"bfbfOH"HL"LE"XSBsstTXTaXaPXHLatg"pt_"phMSMTSTUSTTsre"arg"aeLLLeLeBLMGean"sae"srGT-XTXGTLeceGscnPsesG-e5nctreterbngpegsng-elmanmroeeeenepeecraeteeachrncenernoastertteaaeirecaemtsh_ahesvtrfaritrpoohto_sioaitafoalrfdeediortcotirtiBoaniu(iric(iaardXlnrXoi(oHonseSilSnmXniTnteXrLnipLpSmM(S(TenrTflLfpLfiALs(eoGoeTolGomBTehscermGreerpCleebeneemnl)gfaleseHnnLeeEe:e)dfhsrTteanrPmne,aiaMarTtaUeervntLtaeatBnr"eigo"itXto3ta"lo.r"oo"iratL"eu.B"nr"o)(tiimr.aB"n:miose"saoontn"essn"[t")et"s,:)l"t:yrr]es:nadmietiaosn,HTpMaLr)a"m"s"):

Usage:

#p#p#pyyyFtFtFtohohohrororonnnH3L3E3TaPMoToUoLdedBddXd3d___tttooo___xxxssslllttt...pppyyyooouuutttpppuuuttt---mmmooodddeeewleeapbtueboxdodod_ddwd_i_wtwihit_thph_m_p.pmom.d.ododddtdetite_eiei_l_eeelmleeemnmetensnt_tshs_t_emlplau.tbxe.sxxl.sxlsl

Step 3: Create Wrapper XSLT

Import the generated XSLT and add format-specific functionality:

<<!x/-s<<<<<<<<x-l!x!x/!x/!x/s:-s-s<x-s<x-s<xlhs-l-lxs-lh/s-l!s:tt::sl:t<<hl:-lsmyIiOtl:Ctmh/b/t:Ot-:tllmmve:tuele<ho<<bmtvety_eppemaesm>a!ed!soleemLelwsoorppmtpd-ay-csd>mrpimerhrrrlppol>-d>-rcyprlnpsaettiallma>ir>liaklhpedtyatMHpJiadtaeptPhee-tHeietapteetteerrteTra>vteoetrvoerme>Mmada>>sm>>.ecfoamLadeSpaIxre=otptorcetIsss"tcldcr,rccIlisthaoh,iihFoiet=tc=mpf=nnie"eu"Teti"C=g_m/smtatcta"ep"eeiafen2Mll>snildoeiv.oeaet:warl:a0dmtlTiteps"eeeesEnanmblnctIdae"Ittr"mvn>Dgs=u>Coite_"cSdgsnhttSaaeteu,lt(rmir,iaal:ecost.TumnexEsa,ndsItiel"onheX"/midS/>cgeL>sohdTtnl)ytilegenhstt,inMgi,raedtocr.viewer

Step 4: Execute Conversion

Conversion to each format:

#s#sl#paauyHxLxaEtToaolPhMnTnaUoLetBn-X-e33gx/xxesPsgtnlDl-eee:F:inirhtne_atgetrttmexeaoiln_rt_o_ewaienwrrcopraatnuatpibpipo.poenpenr=yr.n.xoxsn-slsxlts-ol-sp=s:mt:0oe01di1.e_.xexm0lml1el.mteoeno:xt:0s01_1.e.tpheutxbm.lxsl01.xml01.epub

Output Results

Three formats were generated from a single TEI XML file (01.xml):

FormatFile SizeFeatures
HTML115KBMirador IIIF viewer integration, vertical text, interactive navigation
PDF201KB (8 pages)LuaLaTeX Japanese typesetting, landscape layout, color display
EPUB314KBVertical text e-book, XHTML5 compliant

HTML

PDF

EPUB3

Benefits of the Implementation

1. Improved Maintainability

  • Easy to modify Processing Model: Just edit the ODD and regenerate the XSLT
  • Separation of element conversion and presentation: Basic conversion and interactive features are independent
  • Centralized management: Schema and conversion rules are consolidated in the ODD

2. Reusability

  • Reuse of basic conversion XSLT: Can be used in other projects
  • Wrapper customization: Adapts to project-specific requirements

3. Declarative Description

  • Readability: Processing Model is easier to understand than imperative XSLT
  • Documentation: <desc> explicitly states the intent of rules

4. Consistency

  • Consistency across multiple formats: Generated from the same ODD
  • Synchronization of schema and implementation: Definition and implementation stay in sync

Challenges and Solutions: Processing Model Execution Environment

Tools that can directly execute Processing Model, such as TEI Publisher, are limited.

In this effort, we developed a custom XSLT generation tool (odd_to_xslt.py) that generates XSLT skeletons from Processing Model.

Summary

By using TEI Processing Model:

  1. Declarative and maintainable conversion rules can be written
  2. Multiple formats (HTML, LaTeX/PDF, EPUB3) can be centrally managed
  3. Separation of concerns allows independent management of basic conversion and format-specific features
  4. High reusability makes it applicable to other TEI projects

In the Koui Genji Monogatari project, this approach achieved:

  • Generation of 3 output formats from a single ODD file
  • Interactive web viewer (Mirador integration)
  • PDF (LuaLaTeX Japanese typesetting)
  • E-book format (vertical text EPUB3)

References

Source Code

All code introduced in this article is published in the following repository:

root/gteonojlithtRstdeteEdedimxAdi__l_D__we_wMttilwrEootera___hmappxe_epprsppnpeolumterctb.sr.e..o_.xsppd*xssyyd.slixlnsgl_model.md#######PGHLDXEreTaeSPonMTtLUceLeaTBerXi3sawlgstrweegieardnendpaengppdreXepoarMSrectaoLruitdTmoieenolnnttdaotetoofiloiolnnitions