I created a notebook for running Tesseract on Google Colab. It also supports Japanese. We hope this serves as a useful reference.

https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/Tesseractを試す.ipynb

At the end, I also introduce a flow for converting hocr files to alto format XML files. Specifically, the following tool is used:

https://digi.bib.uni-mannheim.de/ocr-fileformat/

We hope this serves as a useful reference.