I recently wrote the following article:

This time, I conducted a brief investigation on the execution time of NDLOCR using Google Colab, and here are the results.

Configuration

The GPU used was:

The following image was used. The size was 5000 x 3415 px, 1.1 MB:

https://dl.ndl.go.jp/info:ndljp/pid/3437686/6

There are four inference processing steps, but this time only “Layout extraction” and “Character recognition (OCR)” were executed:

‘-p 0’: Gutter splitting ‘-p 1’: Skew correction ‘-p 2’: Layout extraction ‘-p 3’: Character recognition (OCR)

Using Google Drive

Let’s consider the case where a mounted Google Drive is used for file I/O. The following input option was used:

Single input dir mode (specified with -s s) default

Running on 1 file produced the following results:

ID	Process	Timestamp	Time Taken (seconds)
p1	Start	2022-04-29 05:30:58	11
p2	Inference start	2022-04-29 05:31:09	2
p3	End	2022-04-29 05:31:11	(Total) 13

The time from p1 to p2 is spent loading configuration files, etc. The inference time per image was 2s.

I registered the same image again and ran on 2 files:

ID	Process	Timestamp	Time Taken (seconds)
p1	Start	2022-04-29 05:38:02	10
p2	Inference start	2022-04-29 05:38:12	6
p3	End	2022-04-29 05:38:18	(Total) 16

I registered the same image once more and ran on 3 files:

ID	Process	Timestamp	Time Taken (seconds)
p1	Start	2022-04-29 05:40:26	10
p2	Inference start	2022-04-29 05:40:36	8
p3	End	2022-04-29 05:40:44	(Total) 18

From the above results, we can see that the initial loading of configuration files takes about 10s, and each image takes about 2-3s of processing time.

In the notebook I created and shared earlier, even with multiple input images, the program main.py was executed for each image file using the following option:

Image file mode (specified with -s f) (Use this when providing a single image file as input)

This means that for each image file (except the first), about 10s of unnecessary time was spent on initial loading.

In fact, when running on 2 files using Image file mode, the time increased by exactly 10s (the time required for initial loading) compared to Single input dir mode:

ID	Process	Timestamp	Time Taken (seconds)
p1	Start of 1st file	2022-04-29 05:52:59	11
p2	Inference start	2022-04-29 05:53:10	2
p3	End	2022-04-29 05:53:12	1
p4	Start of 2nd file	2022-04-29 05:53:13	10
p5	Inference start	2022-04-29 05:53:23	2
p6	End	2022-04-29 05:53:25	(Total) 26

(While this may be obvious,) it is recommended to use Single input dir mode when processing a large number of images.

(Reference) Using GCS (Google Cloud Storage)

I also measured the case of using GCS mounted from Google Colab. Results may vary depending on various settings, but the purpose is to compare with Google Drive described above.

The following input option was used:

Single input dir mode (specified with -s s) default

For 1 file:

ID	Process	Timestamp	Time Taken (seconds)
p1	Start	2022-04-29 06:06:08	13
p2	Inference start	2022-04-29 06:06:21	13
p3	End	2022-04-29 06:06:34	(Total) 26

For 2 files:

ID	Process	Timestamp	Time Taken (seconds)
p1	Start	2022-04-29 06:04:08	12
p2	Inference start	2022-04-29 06:04:20	27
p3	End	2022-04-29 06:04:47	(Total) 39

While the initial loading time didn’t change much, the processing time per image increased about 5 times. This was because saving inference results (images and text files) took longer.

(While this may also be obvious,) it was confirmed that using GCS for I/O in this notebook is not recommended when dealing with large numbers of images.

Summary

I investigated the execution time of NDLOCR using Google Colab. Results may vary depending on various settings, but I hope some parts serve as a useful reference.

Configuration#

Using Google Drive#

(Reference) Using GCS (Google Cloud Storage)#

Summary#

Configuration

Using Google Drive

(Reference) Using GCS (Google Cloud Storage)

Summary