Overview

I built a Gradio App using NDL Kotenseki OCR-Lite.

You can try it at the following URL.

https://huggingface.co/spaces/nakamura196/ndlkotenocr-lite

“NDL Kotenseki OCR-Lite” provides a desktop application, so an execution environment is available without the need for a web app like Gradio.

Therefore, the intended use cases for this web app include usage from smartphones or tablets, and integration via web API.

Development Notes and Bug Fixes

Using Submodules

The original ndlkotenocr-lite was introduced as a submodule.

[spuuarbtlmho=d=uhlntedtlp"ksno:dt/le/kngooitcterhn-uolbci.rtc-eolmi/tned"l]-lab/ndlkotenocr-lite.git

The following is executed during the build.

##gg!ii/Ittbniissntuu/ibbbammaloosiddhzuuelleeanuudppdduaapttdeeate--irsneuimbtomtoed-urleecursive

This should allow the latest files from the original ndlkotenocr-lite to be used during the build.

(There may be some misunderstandings on my part.)

Using Dockerfile

For using the submodule, a Dockerfile-based build approach was adopted.

By setting the sdk to docker, the build is performed based on the Dockerfile.

teccspCimoodihtollkneljoo:nceirrek::FTddroo:oN๐Ÿ‘€o:cuDmkftL:bealrltKrushoeeeetdecnosnefkiiguOrCaRt-iLointereGfreardeinoceApapthttps://huggingface.co/docs/hub/spaces-config-reference

Using Gradio Version 4.44.1

Initially, Gradio version 5.7.1 was used, but the following error occurred when attempting to use the API (described later).

ValueError:Couldnotfetchapiinfoforhttps://nakamura196-ndlkotenocr-lite.hf.space/:{"detail":"NotFound"}

By using version 4.44.1, this error was resolved.

API Usage

Below is an example using “The Tale of Genji” (University of Tokyo General Library).

fcrprleroisiaimeumpnnlaitgttg_(renra==_aedpmsiCcaeuollt=l_iih"tcee=/)lnnhpittare(.nen"pddthrliteecitd_tmpif"pscio:tlr/(et/(n'Cahlktiatempnustr:,a/1/h9ia6in-idnfld.eld_klfo.itiletenco.cur--tloiktyeo..hafc..sjppa/ciei/i"f)/genji/TIFF/A00_6587/01/01_0004.tif/full/900,/0/default.jpg'),

The following outputs can be obtained: image, text, XML, and JSON data.

'{ใ„<''pใค?ci'''rใ‚Œxo{{{{{{{{{miiiiใฎmn'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''gmmmvๅพกlt[[[iiticbiiticbiiticbiiticbiiticbiiticbiiticbiiticbiiticbiiticigggaๆ™‚e444dsesoodsesoodsesoodsesoodsesoodsesoodsesoodsesoodsesoodseson___tใ‚ˆvn366'VxTnu'VxTnu'VxTnu'VxTnu'VxTnu'VxTnu'VxTnu'VxTnu'VxTnu'VxTnfhpneใ‚Šet322:etefn:etefn:etefn:etefn:etefn:etefn:etefn:etefn:etefn:etefoeaa/ใ‹rs,,,r'xidr'xidr'xidr'xidr'xidr'xidr'xidr'xidr'xidr'xi'itmvๅฅณs'0t:tdi1t:tdi2t:tdi3t:tdi4t:tdi5t:tdi6t:tdi7t:tdi8t:tdi9t:td:gheaไปi:515,ilen,ilen,ilen,ilen,ilen,ilen,ilen,ilen,ilen,ileh''rๆ›ดo363c'ingc'ingc'ingc'ingc'ingc'ingc'ingc'ingc'ingc'in{t::/่กฃn[494aใ„ncBaไธญncBaใncBaใ‚ncBaใ‚ncBaใฏncBaไบบncBaใ‘ncBaใชncBaไบบnc''fใ‚=[]]]lใคeeolใซeeolใ‚eeolใ‹eeolใeeolใพeeolใฎeeolใ‚€eeolใ‚‹eeolใฎeei:''oใพ"{,,]'ใ‚Œ''x'ใ„''x'ใ''x'ใ‚Š''x'ใญ''x'ใ—''x'ๅฟƒ''x'ใ„''x'ใ‚’''x'ใ''mddlใŸ1',:ใฎ::':ใจ::':็ตฆ::':็ตฆ::':ใฟ::':ใฆ::':ใ‚’::':ใจ::':ใ„::':ใ—::g6eedใ•.bๅพก:ใ‚„:ใต:ใธ:ๆง˜:ใ‚„:ใ†:ใ‚:ใ‚ˆ:ใ‚Š_7ffeใต0o'ๆ™‚'0'ใ‚“'0'ใ‚'0'ใ‚‹'0'ใŠ'0'ใ™'0'ใ”'0'ใค'0'ใ€ณ'0'ใ‚’'0w6aarใ‚‰"utใ‚ˆt.[tใ”t.[tใ‚Št.[tๅพกt.[tใชt.[tใ‹t.[tใ‹t.[tใ—t.[tใ€ตt.[tใ‚‚t.i,uusใฒnrใ‚Šr8[rใจr8[rใ‘r8[rใ‹r8[rใ—r8[rใ‚‰r8[rใ—r8[rใr8[rใ‚r8[rใˆr8dll/็ตฆ?duใ‹u44uใชu13uใ‚Šu13uใŸu43uใปu12uใ™u32uใ†u42uใชu41uใ‹u31uใฏu4tttzใ‘>ieๅฅณe50eใe47eใฏe24eใ€ณe11eใจe97eๆœe54eใ‚‰e52eใ‚Še28eใ™e}5eใ‚e4h..5ใ‚‹\n'ไป'}1'ใ‚'}2'ใ—'}2'ใ€ต'}2'ใ'}9'ๅค•'}8'ใฟ'}0'่กŒ'}9'ๅ“€',8'ใ‹'}'jjng,ๆ›ด,,,,ใฏ,,,,ใ‚,,,,ใ‚,,,,ใ‚Œ,,,,ใฎ,,,,ใ‚’,,,,ใ‚‚,,,,ใช,,,ใ‚‰,]:pp3n<B่กฃใซใ‚ˆใ–ไธ‹ๅฎฎใŠใฎใ‚‹ใ›]eepไธญOoใ‚1ใฏ1ใ‚Š1ใพ1ใ‚‰1ใค1ใต1ๅฟƒ1ๆฃš1็ตฆ,9gg9ใซCxใพ6ใ‚6ๆˆ‘6ใ—6ใ†6ใ‹6ใค7ใป6ใซ6ใฏ0''sใ„R'ใŸ9ใ‚‰3ใ„2ใ9ใฎ2ใธ2ใ‚‚0ใ2ใŠ9ใ™0,}8ใจD:ใ•]ใฌ]ใจ]ใ‚‚]ๆ›ด]ใซ]ใ‚Š]ใ‘]ใป]ไธ–,}mใ‚„Aใต,ใ‹,ๆ€,ใฎ,่กฃ,ใค,ใซ,ใซ,ใ‚,ใฎ)0ใ‚“T[ใ‚‰ใ™ใฒใซใŸใ‘ใ‚„ใ•ใ—ใŸ1ใ”A[ใฒ[ใ['[ใ‚’[ใก[ใฆ[ใ‚[ใจ[ใฆ[ใ‚1ใจS4็ตฆ4ใ‚Œ3,3ใจ3'2ใ‚‚2ใ‚Š2ใ‹1'1ใ—dใชE3ใ‘0ใ74ใ—1,7'4'2ใก8,5'vใT3ใ‚‹1ใจ22'29,8,0'98,0ใ‚>,',',,,,,,,,,,tใฏ\,,jใซn1555555555tใฏ\63443443437ใ‚t9300410202cใ‚‰<]]]]]]]]]]hใฌP,,,,,,,,,,2ใ‹A1ใ™G[[[[[[[[[jใE433332221tใ‚Œ397307418mใI091979776hใจM,,,,,,,,,0\A0nG1111111110ใE6666667660ใ‚N932922029gใA]]]]]]]]]n็ตฆM,,,,,,,,,/ใตETใ‚=[[[[[[[[[/ใ‚Š"433332221gใ‘d397307418rใ‚Še091979776aใฏf,,,,,,,,,dใ—aiใ‚u555555555oใ‚ˆl344344343/ใ‚Št3004102021ๆˆ‘.]]]]]]]]]4ใ„j]]]]]]]]]1ใจp,,,,,,,,,9ๆ€e0ใฒge\"an2ใ‚W0ใ‹Ieใ‚ŠD0็ตฆT0ใธHeใ‚‹=2ๅพก"4ใ‹9fใŸ05ใ€ณ00ใ€ต"cใ‚0ใ–HbใพEaใ—I0ใGfใ‚‚H1ใฎTeใซ=fใ‚’"dใจ69ใ—7f\64n"1ใ‚>1ใ\7ใญn4ใฟ\7ๆง˜taใŠ\6ใชtbใ—<bใปL7ใจIfใN9ใ‚ŒEfไธ‹4ใ‚‰T0ใ†Y6ใฎP7ๆ›ดE4่กฃ=0ใŸ"4ใกๆœฌd\ๆ–‡9n"4ใฏcใพX1ใ—=fใฆ"cใ‚„4bใ™36ใ‹38ใ‚‰"1ใ™fๆœY9ๅค•=/ใฎ"iๅฎฎ1mใค6aใ‹9gใธ"eใซ.ใคWwใ‘IeใฆDbใ‚‚Tp\H'n=,ไบบ"ใฎ2ๅฟƒ9ใ‚’"ใ†ใ”Hใ‹Eใ—Iใ†Gใ‚‰HใฟTใ‚’=ใŠ"ใต3ใค6ใ‚‚5ใ‚Š"ใซใ‚„Cใ‚Oใ‚ŠN\Fn=ใ‘"ใ‚€0ใ„.ใจ8ใ‚4ใค5ใ—"ใใชOใ‚ŠR่กŒDใ‚‚EใฎRๅฟƒ=ใป"ใ0ใ‘"ใซใ•SใจTใ‹RใกI\NnGใช=ใ‚‹"ใ‚’ใ„ใ„ใคใ‚ˆใ‚Œใ€ณใฎใ€ตๅพกใ‚ๆ™‚ใ‹ใ‚ˆใ™ใ‚Šๅ“€ใ‹ใชๅฅณใ‚‹ไปๆฃšๆ›ดใซ่กฃใŠใ‚ใปใพใ‚ใŸใ—ใ•ใฆใต\ใ‚‰nใฒไบบ็ตฆใฎใ‘ใใ‚‹ใ—"ใ‚Š/ใ‚’>ใ‚‚\ใˆnใฏ\ใ‚tใ‹\ใ‚‰tใ›<็ตฆLใฏIใ™Nไธ–EใฎใŸTใ‚Yใ—P'E,="ๆœฌๆ–‡"X="401"Y="169"WIDTH="29"HEIGHT="364"CONF="0.814"ORDER="1"STRING="ไธญใซใ„ใจใ‚„ใ‚“ใ”ใจใชใใ‚ใฏใซใฏใ‚ใ‚‰ใฌใ‹ใ™ใใ‚Œใใจ"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="372"Y="163"WIDTH="27"HEIGHT="377"CONF="0.812"ORDER="2"STRING="ใใ‚ใ็ตฆใตใ‚ใ‚Šใ‘ใ‚Šใฏใ—ใ‚ใ‚ˆใ‚Šๆˆ‘ใ„ใจๆ€ใฒ"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="342"Y="162"WIDTH="29"HEIGHT="378"CONF="0.841"ORDER="3"STRING="ใ‚ใ‹ใ‚Š็ตฆใธใ‚‹ๅพกใ‹ใŸใ€ณใ€ตใ‚ใ–ใพใ—ใใ‚‚ใฎใซใ‚’ใจใ—"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="312"Y="169"WIDTH="27"HEIGHT="365"CONF="0.819"ORDER="4"STRING="ใ‚ใใญใฟๆง˜ใŠใชใ—ใปใจใใ‚Œไธ‹ใ‚‰ใ†ใฎๆ›ด่กฃใŸใก"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="279"Y="162"WIDTH="28"HEIGHT="379"CONF="0.835"ORDER="5"STRING="ใฏใพใ—ใฆใ‚„ใ™ใ‹ใ‚‰ใ™ๆœๅค•ใฎๅฎฎใคใ‹ใธใซใคใ‘ใฆใ‚‚"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="248"Y="162"WIDTH="31"HEIGHT="378"CONF="0.845"ORDER="6"STRING="ไบบใฎๅฟƒใ‚’ใ†ใ”ใ‹ใ—ใ†ใ‚‰ใฟใ‚’ใŠใตใคใ‚‚ใ‚Šใซใ‚„ใ‚ใ‚Š"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="220"Y="170"WIDTH="27"HEIGHT="362"CONF="0.842"ORDER="7"STRING="ใ‘ใ‚€ใ„ใจใ‚ใคใ—ใใชใ‚Š่กŒใ‚‚ใฎๅฟƒใปใใ‘ใซใ•ใจใ‹ใก"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="189"Y="162"WIDTH="28"HEIGHT="378"CONF="0.830"ORDER="8"STRING="ใชใ‚‹ใ‚’ใ„ใ‚ˆใ€ณใ€ตใ‚ใ‹ใ™ๅ“€ใชใ‚‹ๆฃšใซใŠใปใ‚ใ—ใฆ"/>\n\t\t<LINETYPE="ๆœฌๆ–‡"X="158"Y="169"WIDTH="28"HEIGHT="363"CONF="0.844"ORDER="9"STRING="ไบบใฎใใ—ใ‚Šใ‚’ใ‚‚ใˆใฏใ‚ใ‹ใ‚‰ใ›็ตฆใฏใ™ไธ–ใฎใŸใ‚ใ—"/>\n\t</PAGE>\n</OCRDATASET>\n',

Development

The repository includes a docker-compose.yml, so it can be used for building development environments and deploying to production environments outside of HuggingFace Spaces.

Summary

I am grateful that “NDL Kotenseki OCR-Lite” was released as OSS.

I may be unfamiliar with Docker-based development and there may be inaccuracies, but I hope this serves as a helpful reference.