I recently wrote the following article:

This time, I conducted a brief investigation on the execution time of NDLOCR using Google Colab, and here are the results.

Configuration

The GPU used was:

Fr==iNGF=NP=VPa=/rG=NAIUn=0AoP=opD=cU=rI=e=rANT=Ts=u2-ae=e3s=n9Smm=s5eGI=nMep=lCsID=i0I=a:=n6==g:4P=V=26e=1CI=p60r=0PID=r:.f=00=o23=-=c92=S=e.PP=X=s20ew=M=s03rr=22=e2s:=.3=s2iU=.W=ss=.P=fta=/I=oDeg=D=urne=O3=nic/=f0=dveC=f0=e-a=WT=rMp=y=p=V=e=eB=0=ru=0=ss=0=i-=0P=oI=0r=nd=00o=:=0Mc==0ie=4M=:Bs=6e=0s=0m=0/=.o=:n=3r=01a=2y=46m=.D-=.1e=0iU=06=3ss=0=pa=OM=.g=fi=Ae=fB===C=U==DVG==AoP==lU==Va-==etU==rit=0=sli=%=iel==o==nU==:nC=GU=co=Ps=1om=Ua=1rp=Dg=.ruM=eMe=2.tI=fe=eG=am=E=uNo=CMM=l/r=C..=0tAy===

The following image was used. The size was 5000 x 3415 px, 1.1 MB:

https://dl.ndl.go.jp/info:ndljp/pid/3437686/6

There are four inference processing steps, but this time only “Layout extraction” and “Character recognition (OCR)” were executed:

‘-p 0’: Gutter splitting ‘-p 1’: Skew correction ‘-p 2’: Layout extraction ‘-p 3’: Character recognition (OCR)

Using Google Drive

Let’s consider the case where a mounted Google Drive is used for file I/O. The following input option was used:

Single input dir mode (specified with -s s) default

Running on 1 file produced the following results:

IDProcessTimestampTime Taken (seconds)
p1Start2022-04-29 05:30:5811
p2Inference start2022-04-29 05:31:092
p3End2022-04-29 05:31:11(Total) 13

The time from p1 to p2 is spent loading configuration files, etc. The inference time per image was 2s.

I registered the same image again and ran on 2 files:

IDProcessTimestampTime Taken (seconds)
p1Start2022-04-29 05:38:0210
p2Inference start2022-04-29 05:38:126
p3End2022-04-29 05:38:18(Total) 16

I registered the same image once more and ran on 3 files:

IDProcessTimestampTime Taken (seconds)
p1Start2022-04-29 05:40:2610
p2Inference start2022-04-29 05:40:368
p3End2022-04-29 05:40:44(Total) 18

From the above results, we can see that the initial loading of configuration files takes about 10s, and each image takes about 2-3s of processing time.

In the notebook I created and shared earlier, even with multiple input images, the program main.py was executed for each image file using the following option:

Image file mode (specified with -s f) (Use this when providing a single image file as input)

This means that for each image file (except the first), about 10s of unnecessary time was spent on initial loading.

In fact, when running on 2 files using Image file mode, the time increased by exactly 10s (the time required for initial loading) compared to Single input dir mode:

IDProcessTimestampTime Taken (seconds)
p1Start of 1st file2022-04-29 05:52:5911
p2Inference start2022-04-29 05:53:102
p3End2022-04-29 05:53:121
p4Start of 2nd file2022-04-29 05:53:1310
p5Inference start2022-04-29 05:53:232
p6End2022-04-29 05:53:25(Total) 26

(While this may be obvious,) it is recommended to use Single input dir mode when processing a large number of images.

(Reference) Using GCS (Google Cloud Storage)

I also measured the case of using GCS mounted from Google Colab. Results may vary depending on various settings, but the purpose is to compare with Google Drive described above.

The following input option was used:

Single input dir mode (specified with -s s) default

For 1 file:

IDProcessTimestampTime Taken (seconds)
p1Start2022-04-29 06:06:0813
p2Inference start2022-04-29 06:06:2113
p3End2022-04-29 06:06:34(Total) 26

For 2 files:

IDProcessTimestampTime Taken (seconds)
p1Start2022-04-29 06:04:0812
p2Inference start2022-04-29 06:04:2027
p3End2022-04-29 06:04:47(Total) 39

While the initial loading time didn’t change much, the processing time per image increased about 5 times. This was because saving inference results (images and text files) took longer.

(While this may also be obvious,) it was confirmed that using GCS for I/O in this notebook is not recommended when dealing with large numbers of images.

Summary

I investigated the execution time of NDLOCR using Google Colab. Results may vary depending on various settings, but I hope some parts serve as a useful reference.