Overview

This is a personal note on building an API server for searching the Koui Genji Monogatari (Collated Tale of Genji) Text DB.

https://genji-api.aws.ldas.jp/

Background

The following page publishes text data of “Koui Genji Monogatari” in TEI/XML compliant format.

https://kouigenjimonogatari.github.io/

I created an API that registers this text data in Elasticsearch and enables searching by section (koma).

Usage

The following URL provides access to the documentation page using OpenAPI and Swagger.

https://genji-api.aws.ldas.jp/

Key Features

Search Term Expansion

For example, the following URL is an example with the search keyword “Yugao” (Evening Faces). The input/output format conforms to JSON:API.

https://genji-api.aws.ldas.jp/search?q=夕顔&page[limit]=20&page[offset]=0&sort=page&filter[expandRepeatMarks]=true&filter[unifyKanjiKana]=true&filter[unifyHistoricalKana]=true&filter[unifyPhoneticChanges]=true&filter[unifyDakuon]=true&filter[vol_str]=04 夕顔

At this time, the following results are returned. Variations are generated from the input keyword “夕顔” (Yugao), and the search is performed based on these.

{}""}dmae""]"}"}"""""}ttqt,t,f,slotaaaur"""""""""r"""""i""""""oifog"}""eaaeuuuuleuuuuvrmftgv::rnnxnnnntxnnnnotisaro"""]ys"spiiiiepiiiil"teleldsb[{"f,""""faffffraffff_:"t"g_ouu{}]:o"""",,,onyyyysnyyyys:":ascmc,r,,,,rdKHPD"dKHPDt":tt__k"""mmRaiha:Raiharp27ircoekdeOensokensok"a00,o"otteodppjtnu{pjtnu:g,,n:uhsyc"Qteioeoeioeoesne""_,uiaKrtnaKrtn"""{tr::ceotaii"taii"0,:__ornMncc:Mncc:4ed["uisaaaCaaaC{ro0ne"r"lhtr"lhtrc4ts:k:Kark:Karo_""sanusanu"rc::{"tnge"tnge_o:rae:rae,uu"7[u"su"spn,te:"te:"ptr,:r,:e"ututr:ertert_,ur,urb0eueuo,,e,eu,,nd":0,

As a result, occurrences of “ゆふかほ”, “夕かほ”, and “夕顔” appearing in the text can all be searched at once.

For this search keyword expansion, search options can be toggled ON/OFF. For details, please refer to the Swagger UI mentioned above.

The following OR search is sent to Elasticsearch:

{}"}"""]q,sfsu"}iro{}ebzorro"]""}emt"}yos,mf"""p"lh{}{}{}{}{}{}{}{}{}ii"}:::a":"o,,,,,,,,nltgo:u"}"}"}"}"}"}"}"}"}ite"]20[er{lwwwwwwwwwmerv0,"d{di"i"i"i"i"i"i"i"i"urmo",:e"lololololololololom"sl0r:drdrdrdrdrdrdrdrdr_:"_4{"cicicicicicicicicis:s:[agagagagagagagagagh{triririririririririo{r"dndndndndndndndndnu""a"a"a"a"a"a"a"a"a"al:s:l:l:l:l:l:l:l:l:ldc__________["{t{t{t{t{t{t{t{t{tmeeeeeeeeeaxxxxxxxxxttttttttttc_________hlllllllll"iiiiiiiii:nnnnnnnnneeeeeeeee1sssssssss,.........kkkkkkkkkeeeeeeeeeyyyyyyyyywwwwwwwwwooooooooorrrrrrrrrddddddddd""""""""":::::::::"""""""""**********"********""""""""

The rules used for conversion can be checked at the following URL:

https://genji-api.aws.ldas.jp/normalization/rules

{}"}"}d,ma""}e""ttatvlayt"}"}"}"}aea"ptr,s,o,d"rs:eru"}"}"}"}"}t""""""p""""e""""":st"ilh,d,k,k,pahdkkpttuuuushdkkpiU{:bei"""""""""""a""""""""""""""""""""""""""""""""""""""""""""""""""a""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""a"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""h""""""""""""""""""tiaaahoinnnnciaaah{opusskn宿殿nossknnotoiiiirsknnond"t"t"""""""u""""""""""""""""""""""""""""""""""""""""""""""""""j""""""""""""""an"""""""""""""""""""tujananffffitujan"ane:o:::::::""""o::::::::::::::::::::::::::::::::::::::::::::::::::i"""""":""""""""""""""::"""""""""""":""""""""""""":::::"""""""""""::::""""":"""""""""K"""""""""e:::::::::::::::::::ooiKelsyyyypooiKe:tosr::::nK:::::""::"::::::::::::::::::":"""::::::::::::::::::":::::::""":"::::::::::::::::::a"":""""""""":"::""""::::""":trnKatR"HDKPtrnKater"{i""""""""""""""""""""""""""""""""""""""""""""""""""""""""""a::":""::::":"""""::::"""""n"""":::":""""::":""::"":"""":""""""""""":"::""::":""::""""i""""""""""""""""""{iRaniu:iaahii"ani"dm:c"""":n""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""j::::"""":::::":::"":::::::":::":::::::"""":::"":""""""::"":"::"":ccunjclsknooc:njc1"aa"""""""""""""""""""""""""""""""""""""""""""""""""""""""""a""""""""""""i""::"""":"":""""::"""":::"""":"""":::殿"""::::"C""""""""""""""""""alaiCe{tujnnaaiC.:l{l,,,,,,,""""{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"""""""""""""":::"""""""""::"""""""""""""""""""""""":::"""""""""""":h,,,,,,,,,,,,,,,,,leRRhsooie"l"""h0iK,,,:"""",",,,,,,,:"""""""""""宿:",",,":""",,,,,""""aKsuua"rnKt:K::a."za,","""",,""""","""""""""","","""",""""""",""""""""""""na"lln:i"aian02an{"""",,","""",,""",,","""",,"""""""""",,,"",,","",,""""{"""",,","""",,""",,","""",,""""""""""",,,"",",","",,""""gn:eegc:nc{n""g"0ta,,,,"",,,,,",,,",",,,,",,,",,,,,,,",,"","",,",",,"",,,,,"",,,,,",,,",",,,,",,,",,,,,,,",,"","",,",",,"",easse2aaCae,2i""",,,,",,,,,,,,,,,,""",,,,",,,,,,,,,,,,""R5""R7l""h""5o:,",",""""",",",""""":u0::u0K:a::-n,,,,"",,,,,,,"",,,l,lan0-{,,,,{e99en"g""6rs65sae-u",,""s2l:::("5e:Ts11""0"18",7,,,:""0(,,8:)4("2,.)("6"0,8Z"",)",)"

Summary

While there may be some incomplete aspects, I introduced an example of building a search API server that includes a mechanism for absorbing orthographic variations in the original text.

I hope this serves as a useful reference.