Overview

“Digital Genji Monogatari” is a site that aims to propose an environment to support research on The Tale of Genji as well as education and research activities using classical texts, by collecting and creating various related data about The Tale of Genji and linking them together.

https://genji.dl.itc.u-tokyo.ac.jp/

One of the features provided by this site is the “alignment of the Collated Tale of Genji with modern Japanese translations.” As shown below, the corresponding sections between the “Collated Tale of Genji” and Yosano Akiko’s translation published on Aozora Bunko are highlighted.

This article explains the procedure for implementing the above functionality.

Data

The following type of data is created.

https://genji.dl.itc.u-tokyo.ac.jp/data/tei/koui/54.xml

anchor tags are used to map pairs of files and IDs from Yosano Akiko’s translation to the text data of the “Collated Tale of Genji.”

<te<xbto<>dpy><<<<<>lpls/lbbbe<<sb//gaae>f>nngaccc>cohhsroo=rrr"e#scczpooo=rrn"rrehee_tss2tpp0p==5s""5:hh"/tt/ttnwpp=3ss"i::2d/0./5gg5ree"gnn//jj>kiio..uddillg..eiinttjcci..muuo--nttooogkkayytooa..raaicc/..ajjpppi/aaipptiie//miistt/ee2mm0ss5//5tt-ee0ii1/.yyjoosssoaannn"oo>//5566..xxmmll##YYGG55660000000000340000""//>>

The following tool was developed and used for creating this data.

https://github.com/tei-eaj/parallel_text_editor

Unfortunately, it is not functional as of 2024-01-07, but you can see how it works in the following video. I plan to improve this tool in the future.

https://youtu.be/hOp_PxYUrZk

As a result of the above work, Google Documents like the following are created.

https://docs.google.com/document/d/1DxKItyugUIR3YYUxlwH5-SRVA_eTj7Gh_LJ4A0mzCg8/edit?usp=sharing

For each line of the “Collated Tale of Genji,” the corresponding Yosano Akiko translation ID is inserted in the format \[YG(\d+)\].

22222222000000005555555555555555--------0000000012345678[Y[GY5G6506000000[00Y30G065006]00]0000800][YG5600000500][YG5[6Y0G05060000400000]700]

Google Documents for each volume of The Tale of Genji are saved in Google Drive.

https://drive.google.com/drive/folders/1QgS4z_5vk8AEz95iA3q7j41-U3oDdfpx

Processing

Retrieving the List of File Names and IDs from Google Drive

Connecting to Google Drive

#ifffff#c|mrrrrr|lpoooooaeommmmmesxrxsdptgggggpeoooooooGfroooooorotsgggggto#S]c###i#ite.lllllg_CrffrxpeeeeeliIOeTctIyca.._aaenfPdhriofn:etaoappDiE"seemscoie#wssppprhuauiirtmShae.rttflieetrratutcci_ot=ft.pehsStrliiihthllv_d=tieadecceahvfHnns.h_iie(ipNldtsrrrc:fcvti.tttet2oeeCsf[soeheeerlreoocdt((r.annley:na.=ddeoepkerpe"eacuttili/etueassdwdteeiE)Enrt..efn/otxCrscshnn=vrrsehden,gwkoireoa.=re(.errpdlirtwemsernre="wb_oooeisr:ctwnatdndeIdctrusrrrnbcorh..tseonfnefroiiett.oreegji(nocrsnlektlraw.ifvsdsosc"ttrettodeedvshraleeeooativesaiwen((iieloringnloaacdhla.n.c'celqswymtsllkllrs(llrtjrde:eupicesyesie.Re_uisereiiioaoatn.ddeedpnaodi=csmmmrlppow.f)sxqAa_lnsvrtpppt_eirhjr.puptls".esesooopsseesocviepho,t'earrrHa,.snomrarsF,cfo,rtitttttcn_eletlao"_vimthdott"adid(oSlrwjinpCIbp)emhh)ued)wC_"svcgornuE:l/ee:tn:a).Ost)o3eresireahtnfPehn'AtdtlrtuuaoidrErea(,PeadoetsuraoSvs)IRnlrhetilcmen)cetlt/rhzsr_retrcqiehd'oeec(xoeluadersrdadlptkdielAii_vsioeeesspfvazua.errnnntpiecasirntu:ttFl.ctelet=ni"lemeiraf_0a)oeso_brs)lwttsnfleesoaiesc=kdafl,hrceanle_erntdo(ltte.aw"eosdj.rttk_ssrecoef)oefoktninarmeh:l.depneeosl.(nhejultssyteoe"osnrk"ef,lnoosrSg,CtOiahPnneE.dSf)iisrst

Retrieving the list

ics#riciewmleetoflipirCsenstoevaumfnehrnillqpsiop:fjttclt=agtrooses"g=irpoj=t'e=inens=h=1Srtticn.oGeQie{e(to(dnocsgzs}m'en"uolDeSeusNmfdmgirr4=l:oiapleivz1tigt(envi_0sfn[acDtec50.ii/or.ev,glitcnidv.keeteofvr3f8ftsemnieiiAi(m[fgCvAlEe'fs'i,lePezlfo:ngi_Is9diua.fes(5slnmj,ne)i=edestr.A"s.'oi(vl3n'']nnciiqe,)]"drcs7x,eeetjt[=nd(4P]"te1a)iw=n-gt"4tUee))i3Tmaoo[alDk's_deipfndfap,':tx]h'f)iilnesp(airde,ntnsa"m,e)").execute()

Processing Each Google Document

Based on the file names (volume numbers) and IDs obtained above, processing is performed on each Google Document.

This processing creates the XML data of the Collated Tale of Genji with the anchor tags introduced at the beginning of this article.

The original XML data of the Collated Tale of Genji is published at the following location.

https://kouigenjimonogatari.github.io/

(Reference) Formatting XML Data

The following function was created for formatting the XML data.

defprett"P::"#d#p#pry"rpr"orre("eae"meetstrtttuetau=ttrlymryynfnm__,px:DiXxxprmOnMmmrxilPMiLllemn_rd__tltseoaat_sttmssysrt.___ttiypssxrhnattmiegprrrln:rsii_gXienna:MXnSggsLMtt_sLer==stsditrtsndr)rtXgo\iirM(mnnniLx.'ggnmt..gslojst_pottrsriroiten:nrt(pgit[r.nylegxit)mntley()fporrinlti.neinpretty_xml_as_string.split('\n')ifline.strip()])

When using Beautiful Soup’s prettify() method as shown below, unnecessary line breaks appeared to be included.

#s#pporrBueieptnattu=y(t_piBHhrfeTteuaMmtluLltStyoi=_ufhpustlomSulop)u.pp(rhettmtli_fcyo(n)tent,"html.parser")

Summary

I have documented the steps needed for aligning the Collated Tale of Genji with modern Japanese translations in Digital Genji Monogatari.