Overview

I created an example workflow for generating TEI/XML from data created in Excel.

The following TEI/XML file is output. It supports page breaks using the pb tag, line ID specification using the lb tag, multiple representations using the choice/orig/reg tags, annotations using the note tag, and association with IIIF images.

<<?T<<<xEt<t<f<<TmIef<<<teb<<tas<</s<fEliit</p</s<fexopa<<becul[/zsul[saIxHlittuapoasiitdbbls<aoxsra2loura2luc>vmeetiibbuublH>y>bec<sbdtifb2anrfb3arselaDlttl/bruee>cgho<rce>y>mae]befae]bfirndeelli>lcrDaox>ornoerhg>icleacleamssesSeeciecedrmiionrgeo>le>llce>lcii=rct/SacDeserlcgtoi>gie>re>elo">>m>ttaeDcre:e>etg>csx>s>enhtmitse>>si>e>eso=o>=t>toicspdc>>ou"u"t>no>c==our1r1pSn>""rrc1c.:tS#prce2e0/mtpaee=6="/tmags="""w>tgep"hhew>e_=htltnw_2"ttrtc.22#tpypot2-pps=sde"bas:":ii/-g:1/n->1e3/gc"_d1d=./2dl9l"o>2l.".ur-.nntgbndudf/-dlll-n1l.x.8s-.g=g"/2go"o?10o.1.>.".j0j0jp4p"tp4/>ya"apappepiui=il/"iyiii=iii"i"if8f>f9/35334"443337x776m668l886:66i/cdcma=aan"nnvpviaaafsgse/e/s2_2t223."2"j-sxbxom-mnl1l":-:>i2id0d="="/"p>paaggee__2223"">>

An example visualization of the above TEI/XML data is shown below. The image, text (original), text (regularization), and annotations are displayed on the same screen.

Note that while I used the text of the Koui Genji Monogatari (Variorum Tale of Genji) in this example, the app element would be more appropriate for describing textual variants. Please understand this as sample data intended to explain the workflow.

Excel

The sample Excel data to create is as follows. It has three sheets: image, text, and notes. Each is explained below.

https://github.com/nakamura196/tei_excel_tools/blob/main/demo/data/sample.xlsx?raw=true

“image” Sheet

This describes information about the IIIF manifest file. Please assign a unique ID for page_id.

manifestcanvaspage_idlabel
https://dl.ndl.go.jp/api/iiif/3437686/manifest.jsonhttps://dl.ndl.go.jp/api/iiif/3437686/canvas/22page_22[22]
https://dl.ndl.go.jp/api/iiif/3437686/manifest.jsonhttps://dl.ndl.go.jp/api/iiif/3437686/canvas/23page_23[23]

“text” Sheet

In addition to the page_id specified earlier, a new line_id is added. Enter the text for choice > orig in text1 and the text for choice > reg in text2.

page_idline_idtext1text2
page_22page_22-b-1いつれの御時にか女御更衣あまたさふらひ給けるなかにいとやむことなきゝはいつれの御時にか女御更衣あまたさふらひたまふなかにいとやむことなきゝは

In the example above, there is a difference between “給ける” and “たまふ”.

“notes” Sheet

This describes annotation information.

In addition to the previously created page_id and line_id, a new note_id is added. The pos field specifies at which character position in the line the annotation should be placed. type and subtype are optional. text provides the content of the annotation. image is optional and provides the IIIF image URL for the annotation. How to obtain this URL is described later.

note_idpage_idline_idpostypesubtypetextimage
page_22-b-1-20page_22page_22-b-122校異給けるーたまふ河https://dl.ndl.go.jp/api/iiif/3437686/R0000022/1044,895,82,424/full/0/default.jpg

Converting to TEI/XML

The notebook for uploading Excel and downloading the TEI/XML file is available here.

https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/tei_excel_tools.ipynb

Obtaining the IIIF Image URL for Annotations

This explains how to obtain the IIIF image URL for annotations mentioned earlier. We use the IIIF Curation Viewer created by the Center for Open Data in the Humanities.

Open the image containing the annotation by specifying the manifest and pos in the URL format below.

http://codh.rois.ac.jp/software/iiif-curation-viewer/demo/?manifest=https://dl.ndl.go.jp/api/iiif/3437686/manifest.json&pos=22&lang=ja

Next, click the button shown in the red box in the figure below to select the annotation area.

Then, click on the annotation area and the URL will be displayed.

Paste this URL into the Excel file.

Summary

This is a use-case-specific method for creating TEI/XML files, but I hope it serves as a helpful reference.