Overview

This is a memo about running the NDLOCR application published by NDL (National Diet Library) using a virtual machine on GCP (Google Cloud Platform). For details about this application, please refer to the following repository.

https://github.com/ndl-lab/ndlocr_cli

Creating a VM Instance

Access Compute Engine on GCP and click the “Create Instance” button at the top of the screen.

Under “Machine configuration” > “Machine family”, select “GPU”. Then for “GPU type”, select “NVIDIA T4”, which is the most affordable option. Set “Number of GPUs” to 1.

For “Series”, select “n1-standard-2”.

With “n1-standard-1”, a MemoryError occurred as shown below.

Next, under “Boot disk”, select “Switch image”. Then select the recommended “Deep Learning on Linux”.

An important note here: change the “Size” from the default 50GB to 100GB. With 50GB, a “no space left” error occurred.

The following shows the information after the environment setup was completed. Since over 40GB was already used, I recommend setting a generous “Size”.

uFut/ttt/t_idmdmmmdmnlepepppepaevfvfffvfkss/sss/sayssmsddutaare11am5_satoruS71475711@i..9...2.iz45240445neGGGGMGMGstaU85ns.4.ce417ed0MG000M0-4A:v71475711~a..3...1.$i45240495lGGGGMGMGdfUs-e01900050h%%%%%%%%%M///////drdrsbrueueuyounvnvnsnt//teslfudhosesmc/feokcirng/r1o0u0p1

After that, click the “Create” button at the bottom of the screen to complete the VM instance creation.

After a while, the VM instance will start up as shown below. Click the “SSH” button to connect to the VM instance.

Operations Inside the VM Instance

Installing the Nvidia Driver

After connecting via SSH, the following screen is displayed. Press “y” to install the Nvidia driver.

=W=VBRoTsL_TtiDpTW=e=eaenoui6hhneeho=l=rsssdn4eedbriu=c=seoGGGrouiimsl=o=idutooexpevaid=m=oroaooirxintV=e=nocggggnoioadtMy==:nelgllspngcuGeo=t=:seeeettsrtaNdru=o=c:da/talUe==oDD/CGldamd/bql=t=meeglrlensifLyui=h=mbeoooecsiiik=e=oipouuNpeitlnare==nagdpvl-nreupe=G=-nLl:ie1cisxpst=o=ceeDdalblo=o=uGa-ohir4uuiciN=g=1Nrdctan.dtnocvi=l=1Unluti1eimain=e=3i-mpdn9do/ebds==.Lnpesrg.nuslit=D=migln:i0wseaa=e=9nat/i-itrwl=e=1uPta/en1te/ildl=p=xlftgrs9hrstar==aoirt-mhhwit=L=1troo(actsa.vh=e=0fmnuillhrAee=a=o:pfloefeBr=r=(rs-uo/SsN=n=bmh.nddDrdOv=i=utger-eoLti=n=sStoeiabecUod=g=ttpodvmia/Ti==easgeedac*Efa=V=rc:ldr6nh/Lu=M=)k/e).4cYndO/.sGpocr(vccrh#NrpNtiGelou1UyOivNromngroeUfu/:SLriWnr/ldfMiagA?Lo.oPnmhRciwgrutRo[n:ouDxa.AryuomerNr/xhg/bseTentl#iyYc]4te!asd,t.p.fntel1scoesty9:or4mco../mu.r0//m1ait-sd/9rbh1teg.eeeI9aeo2dn-cpo3fesck-g2rixtloll-entaovee1eeluea-tnldrrd(shta-fnl2oetali-0fimonp2todwgl2wn6.-a-a4cvt0rtomf3eaxmo-;k8/r0e6qm7s_u)6e~4sx1\t8ni6minute.

However, running this immediately after startup resulted in the following error.

WIwiEENonan::vusisiltttCUddaaonilaluaaylpllboitdldunlerglinilonotviNcutoekvkxreisgadhecitiretqnoaeausldlitideeoranrarcelsisskltve:teaedhdlrlve.l.ianrdtu/phxlke-ighbNe/fvadridpodeknirgtas/e-lnd4odr.ci1klv9-oe.fcr0rk?-o1n[9ty-ev/cnanldr]o/u-lydi-obap/meddnp6k4(g1/1l:ocRke-sforuorncteentde)m,poirsarainloythuenravpariolcaebslse)usingit?

Running the following ps command showed that other processes were running.

psaux|grepapt

Therefore, exit once, wait a moment, and then reconnect via SSH. You will be asked the same question, and pressing “y” again completed the installation as follows.

WIwiRBRl0DD-VUWWNonaneueiRo4enAAvusisaianuIw7rcRRiltttdldupVn0ioNNddaaidixgEl.fmIIilalnin-rRo5ypNNaylplgngha_a7irGGoitgedVd.ne::dunlpsaeEi0gsrgliadtddRn2sTinXdilonceae,Sg.aihnv/iviNcukptrIrrnesiudsekvkxaees0Oducgtdsrtreisgn-Nrnhnairirdhedi4n:iiNvla/viitireen.evvVil-lebnoaealnf1w4eeIdaiirusldico9l7rDitnbttideesyr.y0iIais6mianrartm0.fnA-ot4olsisssta-i5rtdnadnltve:.rt1n7oeArlxueaed.ei9s.mgcmololadlrl.eo-t0rcferen.l.inca2Giemrg,dnD.llCtlot/tuo.olSyedhwmprhxn.ue.rueaolee-eddl.alsdeihD-,o.teNuanNeoaceVflssvanm0aOdwIoeetidedtKiDrsade6tiGlIc'ilir4oorlAe;nlasnads-irpndtttd4seahorthahr.mnitioelei1aodcvslv9lvsbegede.reierutrr0enD.ephi?-aasrisaev1dntinste[9ydavsh`ry-lettsp./ct1lrahknlh:lewg]oenfle-uogoeXrcydntsrdeo-e:.lnawu/Linfmep/iAboidsgnnsrtg6trvua`4aixarqvdd-yuueeixretrda8epris.-6saylid_utaior6lhbtni4tlyv,e(e4/a4r7Dufn.s0Rsrd1-.Mr9u5-mt.s7Klh2-.Mite3p0Sbh2u26eX-b.w4.1l.i'sO)i.lyr.c.lasg/.ntt.ndeSe.omDs.tX.Kl./a.fmd/.uoIe4.ndfv7.cue0.tlXl..ieo5.ofp7.npam..aie0.wtln2.ihst/.tN.htpV.oaI.tcD.hfkI.iiaA.sng-.deL.i.tfn.hou.erx.-.Nyx.Vo8.Iu6.Dr_.I6.A4.

I did not understand the meaning of the above WARNINGs, so I ignored them…

Starting the Docker Container

From here on, I could proceed following the GitHub README.md.

Since docker and git were already installed, I ran the following. Note that running dockerbuild.sh takes some time.

gcssidhhtncd/llddooooncccerkk_eecrr-l//ridreouccnuk_redsroibcvukeielrhd.t.stshphs://github.com/ndl-lab/ndlocr_cli

Running Inference

This could also be done following the README.md.

This time, I ran inference on page 4 of “Koui Genji Monogatari, Volume 1”.

https://dl.ndl.go.jp/info:ndljp/pid/3437686/4

#doLcokgerinexteoct-hie-ctont-auisneerrrootocr_cli_runnerbash

Try running inference inside the container.

#w#pgyDeRtotuhwnonhnltiotnmapfadsei:rna/e.n/npwcyiwemwia.(ngdwfelie.trtnhodsl-as.xmagpmoolp.peljt_epid_/oadantapatitaooiuiiatmilpgfsu/ot3_4od3ui7tr6p8u-6tx/Rr0e0c0o0g0n0i4t/ifounllr/efsuulllt/0x/mdleffaiullets.)jpg-Psample_data/img/

The inference runs as follows.

rsiocUlslNNml[{##1i###/i###T{T{N{##=Aotnuosoeooooo{'##/m###ul###h'h'o'##=voaptniatada'i##1g###se'###iTiTT##=etrupfnddTSedin#r.dsYsYpY#=r@ttuigurelinp#P[0PPLaLLLPPrP#s=af_tgfpcaqnpu#a=aaaltaiiBEBEeE#a=gdir__Trhnuigut#g=tggyoaynnL'L'd'#v=e3norfeoEesent_#e=oeeocoeeO:O:i:#e=5fooinmQcfnpp_d=puapuCCc=pbetolsLkocurdiSS=DDtlitOOK'K't'x=r6rteocpreteirTe=cee/pCCeEm=o2e:roomMtr'Ap=ossEleERReedNl=cfn::Fnviaopr':Ra=nkkxilxl'l''D=eecslfentdaa:Tr=feetbitPPe,e,S,:=sfeaocoirtieria=:wwr/nrrrmmTP=s6muowgsolanPt=apeaooe'e'R'A/=i1!ptn=ifnimerAi=0PPcyccctXtXIXGr=n:lpfbsornedroGo=.rrttiteen'n'N'Eo=g~euiarnomgtooEn=8ooihniss::G:o/_tgccmoemot=6ccooosswwItPtod_.k/NdmrotI#=5eennyni'i'f'N/RicadyenDluosdoN#=6ss3o##l8l8o8FoOmrtimndLoldeocF#=6ssP.uP##l8l8r8EcCe_arldl)ceu3lcrE=2r7rr##111RrEc._,al2r_R=2##o/ob'b't'E_S:lllsef_cE=1##cdcce,e,h,NcSia9p1rclN=7##eioeiClI6#ypes2oliC=1ssnss's's'EiN.ocacp0miE=4stfskYkYY/G0pultie0s=0-ii'i'x'Po2ytahfcssaP=1#pg#p:p:m:RuT6t/s:ii2ramR=9#a#pplOtI4hmsef0cmpO=7#cf#e'e'_'CpM9ooesdi/plC=8kid3d3l3EuE4ndsre1tleE=al.7.7i7St7ecdee_S]ge88n8S_=4mli/5x_dSe.''e'd=1asnn1tda-s',,,i=4i/cd2_at#/,#r=3nnllrta#1m'''#/=9.du_2ea'#smUWWW#s=8pldl5c',#dsIII#a=1y_ea6o,#1eeDDD#m=9ldyg'#strTTT#p=ia.o7n'i#//WHHH#l=snyu0iim#sda'''#e=efot8tmgtar:::_=ceu/5ig_etnd=rtmo_lpai'''a=/_o1nlisn444t=scd0/iseg333a=iaoe0mstt)999/=mmnlot's'''x=apfsNd':/,,,mgli/oe:uleegenl[t'''/_.pes['iHHHsfdpo/'lEEEaiaycRnrsIIImlt,hedro.GGGpea_sloopHHHlc1NeotyTTTeoh4ent:'''_ue0tfo6:::dtc_ioc8apkaNxcr:'''tuploer_444atolnd_cU222._i_e6cls888xdne4lie'''mitqC-ir,,,lr=lTmsW#s_Cjsaa'''#-rb0amrCCC#xct-mpnOOO/.spliNNNnpylenFFFdtne_g'''lht_d::::_hdal1at"'''a.taI000ypam...otia999uhimg999tmge999/gT'''mdo}}}odeTdefeefanlaussulo/ltret."p.jojppcpgihg'p_']e1],l4,i0n_oeaoulutiltps_pueutrqt_el_dp_dilbiratr'c.':ep:dthbryrooo"otDteofocacrur_l_ctclFlioiromouautttpBpuuutnt_d_dldieir"rsfsaoamrmpplbleae_t_dcdahattaia'n'}f}e]rence.Itisrecommendedtomanuallyreplaceitinthetestdatapipelineinyourconfigf

As a result, the OCR processing results could be confirmed as follows.

r<<<<<<<<<<<<<<<<<<<<<<o?OLLLLLLLLLLLLLLLLB/LLBoxCIIIIIIIIIIIIIIIILPIILtmRNNNNNNNNNNNNNNNNOANNO@lDEEEEEEEEEEEEEEEECGEECfAKEKdvTCCCCCCCCCCCCCCCC>CC3eAOOOOOOOOOOOOOOOOC<OOC5rSNNNNNNNNNNNNNNNNOPNNObsEFFFFFFFFFFFFFFFFNAFFN6iT================FG==F2o>""""""""""""""""=E""=fn<1111111111111100"00"e=P................0H..0f'A0000000000000098.E64.61G00000000000000925I1991.E00000000000000939G019:0""""""""""""""""3H""9~'H"T"EHHHHHHHHHHHHHHHH=HHoeIEEEEEEEEEEEEEEEEH"EEHcnGIIIIIIIIIIIIIIIIE3IIErcHGGGGGGGGGGGGGGGGI4GGI_oTHHHHHHHHHHHHHHHHG6HHGcd=TTTTTTTTTTTTTTTTH3TTHli"================T"==Tin3""""""""""""""""=""=#g44222282222222288"I11"=243333733323233084M514c'601333613092833""6A172au""9647"10919627"G""8tt""""""""""""SSE"fISSTTSNSSo-MTSSSSTSSSSSSSSRRTATTTu8ARTTTTRTTTTTTTTIIRMRRYt'GIRRRRIRRRRRRRRNNIEIIPp?ENIIIINIIIIIIIIGGN=NNEu>NGNNNNGNNNNNNNN==G"GG=tA=GGGG=GGGGGGGG""=d=="_M"===="========3"e""dE"""""""""""""%f"i=θ"a"r"T"u"T/dYTlYWsePTYtTPIafEYP_YEDma=PERP=Tpu"E=.E"Hll="j==et"p""__""g4dL"3a.TW""9tjYI"W"apPDWIW"/gETIWDIXx"=HDITDW=m"=TDHTI"lW"HT=HD8/I"7=H"=T8sD"7"=2"H1aTT"3"57="mHWY9492"p=IPX68599Yl"DE=""""2=e2T=">""_4H"2XXX3d8=1===X7a5"2"""=8t"4"9155""a>5"6871."W3336xIY2""8mXD="3l=T"YY""H6Y==1=4=""Y1"0"22=44"157"788482""4687"""7YX8=="""572625""""""""Y""""""=TTTTTT"TTTYTYTTYYYY5YYYPYPYYPPPP1PPPEPEPPEEEE6EEE=E=EE===="==="="==""""""""""""""""""""""WWWWWWWWWIWIWWIIIIIIIDIDIIDDDDDDDTDTDDTTTTTTTHTHTTHHHHHHH=H=HH======="="=="""""""5"5""5555555952556855230"1"32""""""""""XXXXXXXXX=X=XX======="="=="""""""1"1""1111618890116578645954231100828725334994"6""""61"""""""YYYYYYYYY=Y====YY===="=""""==""""5"5555""5555651616551111627496217799"3""""28"""""""

Summary

I was able to successfully run the NDLOCR application. Please remember to stop the instance after running it.

I deeply appreciate the NDL staff who published this application.

Addendum

2022.04.28

I wrote an article about running it using Google Colab. I hope this is also useful.