Running LEAF-Writer in a Local Environment

Overview I had the opportunity to run LEAF-Writer in a local environment, so here are my notes. Repository The following repository is used. https://gitlab.com/calincs/cwrc/leaf-writer/leaf-writer Method g c n n i d p p t m m l c e i r l a u o f n n - e w d r e h i v t t t e p r s : / / g i t l a b . c o m / c a l i n c s / c w r c / l e a f - w r i t e r / l e a f - w r i t e r LEAF-Writer starts on port 3000. ...

June 26, 2024 · 1 min · Nakamura

Examining the Contents of the DHC Format

Overview At the annual conferences of Digital Humanities and The Japanese Association for Digital Humanities (JADH), it is common to use a tool called dhconvalidator to convert DOCX or ODT files into DHC files for submission. https://github.com/ADHO/dhconvalidator This article is a note for understanding this format. Examining the Contents DHC files are described as follows. This is essentially a ZIP archive containing their original OCT/DOCX file, an HTML rendering and an XML-TEI rendering, plus a folder with the image files, properly renamed). ...

June 16, 2024 · 4 min · Nakamura

Trying cwrc's wikidata-entity-lookup

Overview This is a continuation of the following article. One of the features of LEAF-WRITER is described as follows: the ability to look up and select identifiers for named entity tags (persons, organizations, places, or titles) from the following Linked Open Data authorities: DBPedia, Geonames, Getty, LGPN, VIAF, and Wikidata. This feature uses libraries such as the following. https://github.com/cwrc/wikidata-entity-lookup I tried out this feature. Usage npm packages are published at the following locations. ...

May 16, 2024 · 3 min · Nakamura

Trying the CWRC XML Validator API

Overview One of the editors for TEI/XML is LEAF-WRITER. https://leaf-writer.leaf-vre.org/ It is described as follows: The XML & RDF online editor of the Linked Editing Academic Framework The GitLab repository is below. https://gitlab.com/calincs/cwrc/leaf-writer/leaf-writer One of the features of this tool is described as: continuous XML validation This validation appears to use the following API. https://validator.services.cwrc.ca/ The library seems to be: https://www.npmjs.com/package/@cwrc/leafwriter-validator This time, I tried the above API. ...

May 16, 2024 · 6 min · Nakamura

RELAX NG and Schematron

Overview When creating TEI/XML with oXygen XML Editor, the following template is generated. < < < < ? ? ? s T / x x x c E < < T m m m h I t / t / E l l l e e t e t I - - m x i e x e > v m m a m H < i t < x e o o t l e f / H > b / t r d d y n a i f e o b > s e e p s d l < < < i a d < o i l l e = e e t / p / s / l d y p d o n " r D i t u p o s e e > > y n h h s h > e t < i b < u u < o D r S > = r r = t s l t t l p b r p u e > o " e e " t c e i l i > l c > r s m 1 f f h p > S t e c P i e I c c e . = = t : t l S a u c D n e > 0 " " t / m e t t b a e f D t " h h p / t > m i l t s o e e t t : w > T t o i i c r s x e t t / w i > n c o > m c t n p p / w t S a n a > c : : p . l t t S t h o / u t e m i t i e d / r e < t o m o r i w w l i / > n t n e n w w . - t > . g w w o c i I a < = . . c . t n b / " t t l o l f o p U e e c r e o u > T i i . g > r t F - - o / m - c c r n a t 8 . . g s t h " o o / / i e ? r r d 1 o > g g s . n s / / d 0 < o r r l " / u e e / > p r l l s > c e e c e a a h < s s e / e e m p / / a > x x t m m r l l / n t t " e e ? i i > / / c c u u s s t t o o m m / / s s c c h h e e m m a a / / r r e e l l a a x x n n g g / / t t e e i i _ _ a a l l l l . . r r n n g g " " t t y y p p e e = = " " a a p p p p l l i i c c a a t t i i o o n n / / x x m m l l " " s c h e m a t y p e n s = " h t t p : / / r e l a x n g . o r g / n s / s t r u c t u r e / 1 . 0 " ? > I was curious about the following difference, so I am sharing the results of querying GPT-4. ...

May 16, 2024 · 8 min · Nakamura

Using the Docker Version of TEI Publisher

Overview I had an opportunity to use the Docker version of TEI Publisher, so here are my notes. https://teipublisher.com/exist/apps/tei-publisher-home/index.html TEI Publisher is described as follows. TEI Publisher facilitates the integration of the TEI Processing Model into exist-db applications. The TEI Processing Model (PM) extends the TEI ODD specification format with a processing model for documents. That way intended processing for all elements can be expressed within the TEI vocabulary itself. It aims at the XML-savvy editor who is familiar with TEI but is not necessarily a developer. ...

May 15, 2024 · 3 min · Nakamura

Formatting XML Strings in Python

Overview Notes on programs for formatting XML strings in Python. Program 1 I referenced the following. https://hawk-tech-blog.com/python-learn-prettyprint-xml/ I added processing to remove unnecessary blank lines. f i d r m e o p f m o r p r p p p r x t r e r r r e m e p e e e t l r t a t t t u . e t r t t t r d i s y y y n o f e m y d = = = p ( r i r = r p r e m o e r e t p u m . e . t o g i s t s y r h n u t u t _ i b y b s d ( . ( m t o r r r i r m " e " n i . [ p \ i n p l n d g a t a \ o ) r c s m : s ] e * e + ( \ S \ " n t n > " r " \ , i , n n " g " n ( " \ n r , t " o < , u r " g e , p h p r _ a " e s r > t t s \ t r e n y i d \ ) n . t g t < ) o " # p ) r R e e t # p t l y R a x e c m m e l o ( v c i e o n n d u s e n e n n c t e u = c t " e i \ s v t s e " a ) r l ) y i n b e # l a b R n r e k e m a o l k v i s e n e ( u s i n n n c e l c u e d s i s n a g r y b l l a i n n k e l b i r n e e a s k ) s w a i f t t h e r a i s n i d n e g n l t e a t l i i o n n e b r e a k Program 2 I referenced the following. ...

May 9, 2024 · 4 min · Nakamura

Parsing XML Strings in Node.js

Overview To parse XML strings and extract information from them in Node.js, I recommend using the xmldom library. This allows you to work with XML in a way similar to how you manipulate the DOM in a browser. Below is how to set up a function to parse XML and extract elements, focusing on “PAGE” tags, using xmldom. Install the xmldom library: First, install xmldom, which is needed to parse XML strings. n p m i n s t a l l x m l d o m Use xmldom to parse XML and extract the required elements. c c c c c c o o o o o o n n n n n n s s D s s 全 s 発 s t t O t t て t 見 o M の さ l { x P p x P p れ e m a a m A a た . D l r r l G g P l O S s s D E e A o M t e e o 要 s G g P r r r c 素 E ( a i を を = 要 ' r n 使 = = 取 素 P s g 用 得 x の A e し n p m 数 G r = て e a l を E X w r D ロ 要 } " M s o グ 素 . L D e c に の = . 文 O r . 記 数 . 字 M . g 録 : r " 列 P p e ( ' e ; を a a t 例 , q 解 r r E ) u 析 s s l p i e e e a r r F m g e ( r e e ( ) o n s ' ; m t . x S s l m t B e l r y n d i T g o n a t m g g h ' ( N ) ) x a ; ; m m l e S ( t ' r P i A n G g E , ' ) ' ; t e x t / x m l ' ) ; In this example, the basic function logs the XML string, parses it into a document, iterates over each “PAGE” element, and logs its attributes and content. The processing within the loop can be customized based on specific requirements, such as extracting particular details from each page. ...

April 24, 2024 · 2 min · Nakamura

TEI/XML Visualization Example: Map Display Using Leaflet

Overview For visualizing TEI/XML files, I created a repository that publishes visualization examples and source code. https://github.com/nakamura196/tei_visualize_demo You can see the visualization examples on the following page. https://nakamura196.github.io/tei_visualize_demo/ This time, I added an example of marker display using MarkerCluster, which I’ll introduce here. Prerequisites This assumes that you can already display markers using Leaflet (without using MarkerCluster). If you haven’t done so yet, please refer to the following visualization example and source code. ...

April 12, 2024 · 5 min · Nakamura

Aligning the Collated Tale of Genji with Modern Japanese Translations in Digital Genji Monogatari

Overview “Digital Genji Monogatari” is a site that aims to propose an environment to support research on The Tale of Genji as well as education and research activities using classical texts, by collecting and creating various related data about The Tale of Genji and linking them together. https://genji.dl.itc.u-tokyo.ac.jp/ One of the features provided by this site is the “alignment of the Collated Tale of Genji with modern Japanese translations.” As shown below, the corresponding sections between the “Collated Tale of Genji” and Yosano Akiko’s translation published on Aozora Bunko are highlighted. ...

January 7, 2024 · 16 min · Nakamura

Usage Example of the Image Map Editor in Oxygen XML Editor

Overview This is an explanation of how to use the Image Map Editor in Oxygen XML Editor. Video https://youtu.be/9dZQ1v0Rky0?si=8EhAZdVsLqgPz2Rf Usage Prepare a TEI/XML file like the following. The url value of <graphic> can specify a relative path from the file, an absolute path on your PC, or a URL published on the internet. In the following example, the file digidepo_3437686_pn_null_9c48d89b-e2ec-4593-8d00-6fbc1d29d1bd.jpg stored in the same folder as the TEI/XML file is referenced. ...

December 12, 2023 · 7 min · Nakamura

Formatting and Syntax Highlighting XML in Nuxt3

Overview As shown in the following image, I had the opportunity to display XML text data using Nuxt3, so this is a memo. Installation I used the following two libraries. n n p p m m i i x h m i l g - h f l o i r g m h a t t . t j e s r Usage I created the following file as a Nuxt3 component. It formats XML strings with xml-formatter and then applies syntax highlighting with highlight.js. ...

November 6, 2023 · 5 min · Nakamura

Mirador 3 Plugin Development: Adding Vertical Text Support to the Text Overlay Plugin

Overview Text Overlay plugin for Mirador 3 is a Mirador 3 plugin that displays selectable text overlays based on OCR or transcription. https://github.com/dbmdz/mirador-textoverlay A demo page is available at the following link. https://mirador-textoverlay.netlify.app/ However, when trying to display vertical text such as Japanese, it didn’t display correctly, as shown below. So I forked the above repository and made it possible to display vertical text as well. The source code is published in the following repository. (I hope to consider a pull request in the future.) ...

August 22, 2023 · 9 min · Nakamura

About ALTO (Analyzed Layout and Text Object) XML

Overview I am sharing the results of querying GPT-4 about ALTO (Analyzed Layout and Text Object) XML. https://www.loc.gov/standards/alto/ Required Elements ALTO (Analyzed Layout and Text Object) XML is an XML schema for representing OCR-generated text and its layout. Its structure is very flexible, with many elements and attributes, but the required elements are limited. The simplest form of ALTO XML has the following hierarchical structure: <alto>: The root element. It must have @xmlns and @xmlns:xsi attributes indicating the version of the ALTO XML schema. It must also have two child elements: <Description> and <Layout>. ...

July 31, 2023 · 5 min · Nakamura

Prototype of an XML File Validation Tool Using JPCOAR Schema (v1)

I previously wrote the following article, where I tried validating XML files using the JPCOAR schema. This time, based on the verification from the above article, I created a validation tool using Google Colab. You can try it at the following URL. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/JPCOARスキーマ_v1を用いたxmlファイルのバリデーション.ipynb You can validate target files by specifying the URL of a published XML file or by uploading a local file. I hope this serves as a helpful reference when creating XML files using the JPCOAR Schema (v1). ...

April 19, 2023 · 1 min · Nakamura

Collaborative Editing of TEI/XML Files Using Visual Studio Live Share (Not Limited to XML)

Overview Visual Studio Live Share is a VSCode extension that enables real-time collaborative development. https://visualstudio.microsoft.com/ja/services/live-share/ This time, we will try real-time collaborative editing of TEI/XML files using this extension. Demo Video A video of the collaborative editing was recorded. https://youtu.be/DzyuJAtzl90 The right side of the screen shows a user (nakamura196) using VSCode in a local environment, while the left side shows a user (Guest User) invited via Visual Studio Live Share editing using the online VSCode (vscode.dev). ...

January 19, 2023 · 3 min · Nakamura

Validating XML Files Using the JPCOAR Schema

Overview JPCOAR Schema publishes XML Schema Definitions in the following repository. Thank you for creating the schema and making the data available. https://github.com/JPCOAR/schema This article is a memo of trying XML file validation using the above schema. (Since this is my first time doing this kind of validation, it may contain inaccurate terminology or information. I apologize.) A Google Colab notebook is also prepared. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/JPCOARスキーマを用いたxmlファイルのバリデーション.ipynb Preparation Clone the repository c g d i t / c c o l n o t n e e n t h / t t p s : / / g i t h u b . c o m / J P C O A R / s c h e m a . g i t Install the library ...

January 19, 2023 · 10 min · Nakamura

Trying the jingtrang Library for RELAX NG Schema: Creating RNG Files

Overview In the following article, I performed XML file validation using jingtrang and RNG files. Since this jingtrang library can create RNG files from XML files, I decided to try it out. I also prepared a Google Colab notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/jingtrangを試す:作成編.ipynb Creating an RNG File As the source file for creating the RNG file, I prepared the following: < r o o t > < t i t l e > a a a < / t i t l e > < / r o o t > For the above file, execute the following: ...

January 18, 2023 · 4 min · Nakamura

Trying the jingtrang Library for RELAX NG Schema: Validation

Overview I had an opportunity to create an XML file conforming to a specific schema, and needed to verify that the XML file matched the schema. To meet this requirement, I tried the jingtrang library for working with RELAX NG schemas, so here are my notes: https://pypi.org/project/jingtrang/ I also prepared a Google Colab notebook: https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/jingtrangを試す.ipynb Trying Validation # p # w # w i g g ラ p r e v e イ n t a t ブ i g l ラ n フ h i h リ s ァ t d t の t イ t a t イ a ル p t p ン l の s i s ス l ダ : o : ト ウ / n / ー j ン / 対 / ル i ロ r 象 k n ー a の o g ド w X u t ( . M i r t g L g a e i フ e n i t ァ n g _ h イ j a u ル i l b の m l u 用 o を s 意 n 使 e ( o 用 r 校 g ) c 異 a o 源 t n 氏 a t 物 r e 語 i n テ . t キ g . ス i c ト t o の h m ダ u / ウ b n ン . a ロ i k ー o a ド / m ) t u e r i a / 1 0 9 1 6 . / x t m e l s t 2 0 2 1 / m a i n / t e i _ a l l . r n g Passing Example Running the following produced no output: ...

January 18, 2023 · 3 min · Nakamura

Double-Sided Ruby Annotations Using python-docx

This is a memo on how to achieve double-sided ruby (furigana) in Word using python-docx. You can try it from the following notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/python_docxを用いた両側ルビ.ipynb An output example is shown below. An input example is shown below. < < < b p 私 < に / p < が / / o > は r < < / 行 p > r < < / あ p b d u r < < / r r き > u r r r り > o y b b r < < / r < < / r t u ま b b t u ま d > y > u r r r u r r r 場 b b し y > b す y > b b t u b b t u > p y た > 入 p y 。 > y > b y > b l > 。 学 l > > 打 p y > 球 p y a 試 a < l > < l > c 験 c / a / a e < e r c r c = / = b e b e " r " > = > = l b a " " e > b r r f o i i t v g g " e h h > " t t ビ > " " リ に > > ヤ ゅ ダ キ ー う < ウ ド が / < < く r / / し t r r け > t t ん > > < / r t > The program is still incomplete, but I hope it serves as a helpful reference. ...

October 4, 2022 · 2 min · Nakamura