Overview# “Digital Genji Monogatari” is a site that aims to propose an environment to support research on The Tale of Genji as well as education and research activities using classical texts, by collecting and creating various related data about The Tale of Genji and linking them together.
https://genji.dl.itc.u-tokyo.ac.jp/
One of the features provided by this site is the “alignment of the Collated Tale of Genji with modern Japanese translations.” As shown below, the corresponding sections between the “Collated Tale of Genji” and Yosano Akiko’s translation published on Aozora Bunko are highlighted.
This article explains the procedure for implementing the above functionality.
Data# The following type of data is created.
https://genji.dl.itc.u-tokyo.ac.jp/data/tei/koui/54.xml
anchor tags are used to map pairs of files and IDs from Yosano Akiko’s translation to the text data of the “Collated Tale of Genji.”
< t e < x b t o < > d p y > < < < < < > l p l s / l b b b e < や < 又 s b / / g a ま a の e > f > n に n 日 g a c c お c は > c o h は h よ s r o し o か = r r て r は " e れ に # s c い c z p o せ o o = r さ r n " r せ r e h e 給 e _ t s や s 2 t p う p 0 p = に = 5 s " 経 " 5 : h 仏 h " / t な t / t と t n w p く p = 3 s や s " i : う : 2 d せ / 0 . さ / 5 g せ g 5 r e 給 e " g n n / / j j > k i i o . . u d d i l l g . . e i i n t t j c c i . . m u u o - - n t t o o o g k k a y y t o o a . . r a a i c c / . . a j j p p p i / a a i p p t i i e / / m i i s t t / e e 2 m m 0 s s 5 / / 5 t t - e e 0 i i 1 / . y y j o o s s s o a a n n n " o o > / / 5 5 6 6 . . x x m m l l # # Y Y G G 5 5 6 6 0 0 0 0 0 0 0 0 0 0 3 4 0 0 0 0 " " / / > > The following tool was developed and used for creating this data.
https://github.com/tei-eaj/parallel_text_editor
Unfortunately, it is not functional as of 2024-01-07, but you can see how it works in the following video. I plan to improve this tool in the future.
https://youtu.be/hOp_PxYUrZk
As a result of the above work, Google Documents like the following are created.
https://docs.google.com/document/d/1DxKItyugUIR3YYUxlwH5-SRVA_eTj7Gh_LJ4A0mzCg8/edit?usp=sharing
For each line of the “Collated Tale of Genji,” the corresponding Yosano Akiko translation ID is inserted in the format \[YG(\d+)\].
2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 - - - - - - - - 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 [ お た の て も 給 〱 Y は ら 御 よ 〱 [ し G し ひ 心 り し Y つ 5 た た ち こ く G ま 6 れ ま の よ お 5 り 0 は ひ ほ な は 6 ぬ 0 そ け と う す 0 る 0 う れ に た る 0 に 0 つ と さ う と 0 [ 0 お こ ふ と の 0 Y 3 と と ら ひ ゝ 0 G 0 ろ に ひ た か 6 5 0 き い 給 ま く 0 6 ] か と へ ひ わ 0 0 や し し る て さ ] 0 ま こ た に い と 御 0 に ま し す ま お 物 0 お り き く す は か 0 は き こ れ こ し た 8 し こ と た し ま り 0 て え は ま ふ し な 0 れ 給 な へ か た と ] い [ か る き る こ を せ Y り け ち こ ま の さ G け ん き と や ゝ せ 5 る 物 り ゝ か わ 給 6 を し く も に た や 0 こ 給 は て し り う 0 の け へ さ て に に 0 た り 給 は お し 経 0 ひ と て き は り 仏 0 一 み け ゝ す 給 な 5 品 た れ こ れ へ と 0 の ま は え は る く 0 宮 ひ お 御 や や ] ゆ と う と つ り せ し け や さ こ な 侍 せ ろ と と 給 御 ま ゝ [ い い ひ Y の り 給 G り 給 へ 5 な [ は 6 と Y し 0 つ G か 0 け 5 侍 0 か 6 0 0 0 0 4 0 0 0 0 0 ] 7 又 0 の 0 日 ] は す よ こ か し は 人 に
Google Documents for each volume of The Tale of Genji are saved in Google Drive.
https://drive.google.com/drive/folders/1QgS4z_5vk8AEz95iA3q7j41-U3oDdfpx
Processing# Retrieving the List of File Names and IDs from Google Drive# Connecting to Google Drive
# i f f f f f # c | m r r r r r | l p o o o o o a e o m m m m m e s x r x s d p t g g g g g p e o o o o o o o G f r o o o o o o r o t s g g g g g t o # S ] c # # # i # i t e . l l l l l g _ C r f f r x p e e e e e l i I O e T c t I y c a . . _ a a e n f P d h r i o f n : e t a o a p p D i E " s e e m s c o i e # w s s p p p r h u a u i i r t m S h a e . r t t f l i e e t r r a t u t c c i _ o t = f t . p e h s S t r l i i i h t h l l v _ d = t i e a d e c c e a h v f H n n s . h _ i i e ( i p N l d t s r r r c : f c v t i . t t t e t 2 o e e C s f [ s o e h e e e r l r e o o c d t ( ( r . a n n l e y : n a . = d d e o e p k e r p e " e a c u t t i l i / e t u e a s s d w d t e e i E ) E n r t . . e f n / o t x C r s c s h n n = v r r s e h d e n , g w k o i r e o a . = r e ( . e r r p d l i r t w e m s e r n r e = " w b _ o o o e i s r : c t w n a t d n d e I d c t r u s r r r n b c o r h . . t s e o n f n e f r o i i e t t . o r e e g j i ( n o c r s n l e k t l r a w . i f v s d s o s c " t t r e t t o d e e d v s h r a l e e e o o a t i v e s a i w e n ( ( i i e l o r i n g n l o a a c d h l a . n . c ' c e l q s w y m t s l l k l l r s ( l l r t j r d e : e u p i c e s y e s i e . R e _ u i s e r e i i i o a o a t n . d d e e d p n a o d i = c s m m m r l p p o w . f ) s x q A a _ l n s v r t p p p t _ e i r h j r . p u p t l s " . e s e s o o o p s s e e s o c v i e p h o , t ' e a r r r H a , . s n o m r a r s F , c f o , r t i t t t t t c n _ e l e t l a o " _ v i m t h d o t t " a d i d ( o S l r w j i n p C I b p ) e m h h ) u e d ) w C _ " s v c g o r n u E : l / e e : t n : a ) . O s t ) o 3 e r e s i r e a h t n f P e h n ' A t d t l r t u u a o i d r E r e a ( , P e a d o e t s u r a o S v s ) I R n l r h e t i l c m e n ) c e t l t / r h z s r _ r e t r c q i e h d ' o e e c ( x o e l u a d e r s r d a d l p t k d i e l A i i _ v s i o e e e s s p f v a z u a . e r r n n n t p i e c a s i r n t u : t t F l . c t e l e t = n i " l e m e i r a f _ 0 a ) o e s o _ b r s ) l w t t s n f l e e s o a i e s c = k d a f l , h r c e a n l e _ e r n t d o ( l t t e . a w " e o s d j . r t t k _ s s r e c o e f ) o e f o k t n i n a r m e h : l . d e p n e e o s l . ( n h e j u l t s s y t e o e " o s n r k " e f , l n o o s r S g , C t O i a h P n n e E . d S f ) i i s r s t Retrieving the list
i c s # r i c i e w m l e e t o f l i p i r C s e n s t o e v a u m f n e h r n i l l q p s i o p : f j t t c l t = a g t r o o s e s " g = i r p o j = t ' e = i n e n s = h = 1 S r t t i c n . o G e Q i e { e ( t o ( d n o c s g z s } m ' e n " u o l D e S e u s N m f d m g i r r 4 = l : o i a p l e i v z 1 t i g t ( e n v i _ 0 s f n [ a c D t e c 5 0 . i i / o r . e v , g l i t c n i d v . k e e t e o f v r 3 f 8 f t s e m n i e i i A i ( m [ f g C v A l E e ' f s ' i , l e P e z l f o : n g i _ I s 9 d i u a . f e s ( 5 s l n m j , n e ) i = e d e s t r . A " s . ' o i ( v l 3 n ' ' ] n n c i i q e , ) ] " d r c s 7 x , e e e t j t [ = n d ( 4 P ] " t e 1 a ) i w = n - g t " 4 t U e e ) ) i 3 T m a o o [ a l D k ' s _ d e i p f n d f a p , ' : t x ] h ' f ) i i l n e s p ( a i r d e , n t n s a " m , e ) " ) . e x e c u t e ( )
Processing Each Google Document# Based on the file names (volume numbers) and IDs obtained above, processing is performed on each Google Document.
This processing creates the XML data of the Collated Tale of Genji with the anchor tags introduced at the beginning of this article.
The original XML data of the Collated Tale of Genji is published at the following location.
https://kouigenjimonogatari.github.io/
The following function was created for formatting the XML data.
d e f p r e t t " P : : " # d # p # p r y " r p r " o r r e ( " e a e " 文 m 整 e 空 e t s t r t 字 形 t 行 t u e t a u 列 = さ t を t r l y m r か れ y 削 y n f n ら m た _ 除 _ , p x : D i X x し x p r m O n M m ま m r x i l P M i L l す l e m n _ r ツ d を _ 。 _ t l t s e リ o 取 a a t _ s t t ー m 得 s s y s r t を . し _ _ _ t t i y 構 p ま s s x r h n 築 a す t t m i e g p し r 。 r r l n : r ま s i i _ g X i す e n n a : M X n 。 S g g s L M t t _ s L e r = = s t s d i t r t s n d r ) r t X g o \ i i r M ( m n n n i L x . ' g g n m t . . g s l o j s t _ p o t t r s r i r o i t e n : n r t ( p g i t [ r . n y l e g x i t ) m n t l e y ( ) f p o r r i n l t i . n e i n p r e t t y _ x m l _ a s _ s t r i n g . s p l i t ( ' \ n ' ) i f l i n e . s t r i p ( ) ] ) When using Beautiful Soup’s prettify() method as shown below, unnecessary line breaks appeared to be included.
# s # p p o r r B u 整 e i e p 形 t n a さ t t u = れ y ( t た _ p i B H h r f e T t e u a M m t l u L l t S t を y o i 取 = _ u f 得 h p u し s t オ l て o m ブ S 表 u l ジ o 示 p ) ェ u . ク p p ト ( r を h e 作 t t 成 m t l i _ f c y o ( n ) t e n t , " h t m l . p a r s e r " )
Summary# I have documented the steps needed for aligning the Collated Tale of Genji with modern Japanese translations in Digital Genji Monogatari.