Overview
This is a memo about issues I encountered when running ndlocr_cli (the NDLOCR (ver.2.1) application repository) and the steps taken to resolve them.
Note that many of these issues were caused by my own configuration oversights or atypical usage, and are unlikely to occur during normal use. Please refer to this article if you encounter similar issues.
Shared Memory Shortage
When running ndlocr_cli, the following error occurred.
The response from ChatGPT was as follows.
The “Unexpected bus error encountered in worker” error message typically occurs when there is insufficient shared memory when using PyTorch’s DataLoader. This is especially seen when the dataset is large or many workers are used.
And the following instructions were given.
If you are using Docker or another virtual environment, you need to increase the shared memory size. When using Docker, set the
--shm-sizeoption when starting the container. For example, set it asdocker run --shm-size 2G ....
Upon checking my Docker execution command, I found that the --shm-size specification was missing. The following script specifies --shm-size=256m.
https://github.com/ndl-lab/ndlocr_cli/blob/master/docker/run_docker.sh
After adding this option, the shared memory shortage error was resolved.
(Reference) Checking Current Shared Memory Size
This could be checked with the following command.
When the above error occurred, it was 64m.
KeyError: ‘STRING’
I encountered KeyError: 'STRING' several times. To address this, I made changes to the following two files.
https://github.com/ndl-lab/ndlocr_cli/blob/master/cli/core/inference.py#L681
Errors were occurring at the line_xml.attrib['STRING'] and elm.attrib['STRING'] sections, so I added the following handling.
Reference: Adding a Progress Bar
There was a case where I wanted to display a progress bar during OCR processing. Modify the following section.
https://github.com/ndl-lab/ndlocr_cli/blob/master/cli/core/inference.py#L213
Specifically, add tqdm as follows.
This allows you to check the current progress and estimated remaining time.
Summary
When using ndlocr_cli in a standard manner, the error handling described in this article is likely unnecessary, but I hope it serves as a useful reference when encountering similar issues.