I recently wrote the following article:
This time, I conducted a brief investigation on the execution time of NDLOCR using Google Colab, and here are the results.
Configuration
The GPU used was:
The following image was used. The size was 5000 x 3415 px, 1.1 MB:
https://dl.ndl.go.jp/info:ndljp/pid/3437686/6
There are four inference processing steps, but this time only “Layout extraction” and “Character recognition (OCR)” were executed:
‘-p 0’: Gutter splitting ‘-p 1’: Skew correction ‘-p 2’: Layout extraction ‘-p 3’: Character recognition (OCR)
Using Google Drive
Let’s consider the case where a mounted Google Drive is used for file I/O. The following input option was used:
Single input dir mode (specified with -s s) default
Running on 1 file produced the following results:
| ID | Process | Timestamp | Time Taken (seconds) |
|---|---|---|---|
| p1 | Start | 2022-04-29 05:30:58 | 11 |
| p2 | Inference start | 2022-04-29 05:31:09 | 2 |
| p3 | End | 2022-04-29 05:31:11 | (Total) 13 |
The time from p1 to p2 is spent loading configuration files, etc. The inference time per image was 2s.
I registered the same image again and ran on 2 files:
| ID | Process | Timestamp | Time Taken (seconds) |
|---|---|---|---|
| p1 | Start | 2022-04-29 05:38:02 | 10 |
| p2 | Inference start | 2022-04-29 05:38:12 | 6 |
| p3 | End | 2022-04-29 05:38:18 | (Total) 16 |
I registered the same image once more and ran on 3 files:
| ID | Process | Timestamp | Time Taken (seconds) |
|---|---|---|---|
| p1 | Start | 2022-04-29 05:40:26 | 10 |
| p2 | Inference start | 2022-04-29 05:40:36 | 8 |
| p3 | End | 2022-04-29 05:40:44 | (Total) 18 |
From the above results, we can see that the initial loading of configuration files takes about 10s, and each image takes about 2-3s of processing time.
In the notebook I created and shared earlier, even with multiple input images, the program main.py was executed for each image file using the following option:
Image file mode (specified with -s f) (Use this when providing a single image file as input)
This means that for each image file (except the first), about 10s of unnecessary time was spent on initial loading.
In fact, when running on 2 files using Image file mode, the time increased by exactly 10s (the time required for initial loading) compared to Single input dir mode:
| ID | Process | Timestamp | Time Taken (seconds) |
|---|---|---|---|
| p1 | Start of 1st file | 2022-04-29 05:52:59 | 11 |
| p2 | Inference start | 2022-04-29 05:53:10 | 2 |
| p3 | End | 2022-04-29 05:53:12 | 1 |
| p4 | Start of 2nd file | 2022-04-29 05:53:13 | 10 |
| p5 | Inference start | 2022-04-29 05:53:23 | 2 |
| p6 | End | 2022-04-29 05:53:25 | (Total) 26 |
(While this may be obvious,) it is recommended to use Single input dir mode when processing a large number of images.
(Reference) Using GCS (Google Cloud Storage)
I also measured the case of using GCS mounted from Google Colab. Results may vary depending on various settings, but the purpose is to compare with Google Drive described above.
The following input option was used:
Single input dir mode (specified with -s s) default
For 1 file:
| ID | Process | Timestamp | Time Taken (seconds) |
|---|---|---|---|
| p1 | Start | 2022-04-29 06:06:08 | 13 |
| p2 | Inference start | 2022-04-29 06:06:21 | 13 |
| p3 | End | 2022-04-29 06:06:34 | (Total) 26 |
For 2 files:
| ID | Process | Timestamp | Time Taken (seconds) |
|---|---|---|---|
| p1 | Start | 2022-04-29 06:04:08 | 12 |
| p2 | Inference start | 2022-04-29 06:04:20 | 27 |
| p3 | End | 2022-04-29 06:04:47 | (Total) 39 |
While the initial loading time didn’t change much, the processing time per image increased about 5 times. This was because saving inference results (images and text files) took longer.
(While this may also be obvious,) it was confirmed that using GCS for I/O in this notebook is not recommended when dealing with large numbers of images.
Summary
I investigated the execution time of NDLOCR using Google Colab. Results may vary depending on various settings, but I hope some parts serve as a useful reference.