Ocr | Digital Archive Systems Tech Blog

Created Notebooks Using NDLOCR and NDL Classical Japanese OCR ver.2

Notice 2026-02-24 ! The notebooks provided on this page will no longer be updated. For NDLOCR, “NDLOCR-Lite” has been released as a desktop application and command-line tool for easy use. Please use this going forward. https://github.com/ndl-lab/ndlocr-lite 2025-04-02 There is currently a bug. Please refrain from using it until the fix is complete. The bug has been fixed. 2025-03-21 For NDL Classical Japanese OCR, “NDL Classical Japanese OCR-Lite” has been released as a desktop application for easy use. Please use this going forward. ...

Running NDL Classical Japanese OCR on mdx

Update History 2024-05-22 Added “Adding the user who executes Docker commands to the docker group.” Overview mdx is a data platform for industry-academia-government collaboration co-created by universities and research institutions. https://mdx.jp/ This time, we will use an mdx virtual machine to run NDL Classical Japanese OCR. https://github.com/ndl-lab/ndlkotenocr_cli Project Application This time, I selected “Trial” as the project type. With “Trial,” one GPU pack was allocated. Creating a Virtual Machine Deploy This time, I selected “01_Ubuntu-2204-server-gpu (Recommended).” ...

Mirador 3 Plugin Development: Adding Vertical Text Support to the Text Overlay Plugin

Overview Text Overlay plugin for Mirador 3 is a Mirador 3 plugin that displays selectable text overlays based on OCR or transcription. https://github.com/dbmdz/mirador-textoverlay A demo page is available at the following link. https://mirador-textoverlay.netlify.app/ However, when trying to display vertical text such as Japanese, it didn’t display correctly, as shown below. So I forked the above repository and made it possible to display vertical text as well. The source code is published in the following repository. (I hope to consider a pull request in the future.) ...

About ALTO (Analyzed Layout and Text Object) XML

Overview I am sharing the results of querying GPT-4 about ALTO (Analyzed Layout and Text Object) XML. https://www.loc.gov/standards/alto/ Required Elements ALTO (Analyzed Layout and Text Object) XML is an XML schema for representing OCR-generated text and its layout. Its structure is very flexible, with many elements and attributes, but the required elements are limited. The simplest form of ALTO XML has the following hierarchical structure: <alto>: The root element. It must have @xmlns and @xmlns:xsi attributes indicating the version of the ALTO XML schema. It must also have two child elements: <Description> and <Layout>. ...

Bug Fixes and Feature Additions to the NDL Classical Book OCR Tutorial Using Google Colab

Overview I have been creating a tutorial for the NDL “Classical Book” OCR application using Google Colab, as introduced in the following article. This time, the following updates were made. Added terms of use Fixed bugs Added support for IIIF Presentation API v3 manifest file input The updated notebook can be accessed at the same URL as before. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/NDL古典籍OCRの実行例.ipynb Terms of Use Please use the notebook itself under CC0. However, the “NDL Classical Book OCR Application” is released by the National Diet Library under the CC BY 4.0 license, so please include the appropriate credit. Also, please check the terms of use for the materials to which OCR is applied. ...

Web Application for NDL Classical Book OCR Using Hugging Face Space

Overview I created a web application for NDL Classical Book OCR using Hugging Face Space. You can try it at the link below. Upload an image, and after about 1 minute, the OCR result text and JSON data will be displayed. https://huggingface.co/spaces/nakamura196/ndl_kotenseki_ocr The following article was used as a reference for creating this application. https://qiita.com/relu/items/e882e23a9bd07243211b Choosing the Right Tool I have separately prepared a Google Colab tutorial as another environment for trying NDL Classical Book OCR. ...

Running NDL Classical Japanese OCR on Amazon EC2 CPU Environment

Overview This is a memo of running NDL Classical Japanese OCR on an Amazon EC2 CPU environment. The advantage is that it can be run without preparing an expensive GPU environment, but please note that it takes about 30 seconds to 1 minute per image. The following article was referenced when building this environment. https://qiita.com/relu/items/e882e23a9bd07243211b Instance Select Ubuntu from Quick Start. For the instance type, I recommend t2.medium or higher. Errors occurred with smaller instances. ...

Running NDL Classical Text OCR Using Amazon SageMaker Studio

Overview Previously, I created tutorials for NDL OCR and NDL Classical Text OCR using Google Cloud Platform and Google Colab. This time, I will explain how to run NDL Classical Text OCR using Amazon SageMaker Studio. Please note that this method incurs costs during execution. The description of Amazon SageMaker Studio is available at the following link: https://aws.amazon.com/jp/sagemaker/studio/ Domain Setup and Other Configuration For domain setup and other configuration, please refer to articles such as the following: ...

NDL Classical Text OCR Using Google Colab

Overview I created an NDL “Classical Text” OCR application using Google Colab. You can try it at the following URL. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/NDL古典籍OCRの実行例.ipynb The description of NDL Classical Text OCR is as follows. https://github.com/ndl-lab/ndlkotenocr_cli The notebook was created with reference to @blue0620’s notebook. Thank you! https://twitter.com/blue0620/status/1617888733323485184 In the notebook I created, I added support for additional input formats and a feature to save to Google Drive. How to Use The usage is almost the same as the NDLOCR application. Please refer to the following video. ...

Building a Layout Extraction Model Using the NDL-DocL Dataset and YOLOv5

Overview I built a layout extraction model using the NDL-DocL dataset and YOLOv5. https://github.com/ndl-lab/layout-dataset https://github.com/ultralytics/yolov5 You can try this model using the following notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/NDL_DocLデータセットとYOLOv5を用いたレイアウト抽出モデル.ipynb This article is a record of the training process above. Creating the Dataset The NDL-DocL dataset in Pascal VOC format is converted to YOLO format. For this method, refer to the following article. In addition to the conversion from Pascal VOC format to COCO format, conversion from COCO format to YOLO format was added. ...

NDL OCR Now Supports Ruby (Furigana) Text Extraction

Overview For NDL OCR, the default setting previously did not include ruby (furigana) text extraction. Thanks to the cooperation of the NDL team, it is now possible to configure whether or not to perform text extraction for ruby. https://github.com/ndl-lab/ndlocr_cli/ Setting the following to True in config.yaml enables the ruby text extraction feature. y i e l d _ b l o c k _ r u b i : F a l s e Please note the following caveats when using this feature: ...

Created a Video on How to Use the NDLOCR App with Google Colab

I created a video on how to use the NDLOCR app with Google Colab. I hope it serves as a useful reference. https://youtu.be/46p7ZZSul0o The blog used in the video is the following. Note that the “Initial Setup” portion has been trimmed in the video. In reality, it takes about 3-5 minutes, so please be aware.

Running gcv2hocr on Google Colab: Creating Searchable PDFs with Transparent Text Using Google Vision API

Overview gcv2hocr is a repository that converts Google Cloud Vision OCR output to hOCR format and creates searchable PDFs. https://github.com/dinosauria123/gcv2hocr I created a notebook to run the above repository on Google Colab. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/gcv2hocrの実行サンプル.ipynb As shown below, you can create searchable PDF files. How to Use Access the following notebook. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/gcv2hocrの実行サンプル.ipynb First, obtain an API key to use the Google Cloud Vision API. The following article may be helpful. https://zenn.dev/tmitsuoka0423/articles/get-gcp-api-key ...

Created Version 2 of the NDLOCR App Using Google Colab

Announcements Notebook URL https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/ndl_ocr_v2.ipynb 2022-07-06 A demo video showing how to use it has been created. https://youtu.be/46p7ZZSul0o Additionally, a ruby (furigana) text conversion feature has been added. Overview I created an NDLOCR app using Google Colab and introduced it in the following article. This time, I created Version 2, an improved version of the above notebook. You can access the notebook from the following link. https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/ndl_ocr_v2.ipynb Features Support for multiple input formats has been added. The following options are available: ...

Execution Time for NDLOCR Using Google Colab

I recently wrote the following article: This time, I conducted a brief investigation on the execution time of NDLOCR using Google Colab, and here are the results. Configuration The GPU used was: F r = = i N G F = N P = V P a = / r G = N A I U n = 0 A o P = o p D = c U = r I = e = r A N T = T s = u 2 - a e = e 3 s = n 9 S m m = s 5 e G I = n M e p = l C s I D = i 0 I = a : = n 6 = = g : 4 P = V = 2 6 e = 1 C I = p 6 0 r = 0 P I D = r : . f = 0 0 = o 2 3 = - = c 9 2 = S = e . P P = X = s 2 0 e w = M = s 0 3 r r = 2 2 = e 2 s : = . 3 = s 2 i U = . W = s s = . P = f t a = / I = o D e g = D = u r n e = O 3 = n i c / = f 0 = d v e C = f 0 = e - a = W T = r M p = y = p = V = e = e B = 0 = r u = 0 = s s = 0 = i - = 0 P = o I = 0 r = n d = 0 0 o = : = 0 M c = = 0 i e = 4 M = : B s = 6 e = 0 s = 0 m = 0 / = . o = : n = 3 r = 0 1 a = 2 y = 4 6 m = . D - = . 1 e = 0 i U = 0 6 = 3 s s = 0 = p a = O M = . g = f i = A e = f B = = = C = U = = D V G = = A o P = = l U = = V a - = = e t U = = r i t = 0 = s l i = % = i e l = = o = = n U = = : n C = G U = c o = P s = 1 o m = U a = 1 r p = D g = . r u M = e M e = 2 . t I = f e = e G = a m = E = u N o = C M M = l / r = C . . = 0 t A y = = = The following image was used. The size was 5000 x 3415 px, 1.1 MB: ...

Running NDLOCR App with Google Colab (Image Input and Result Saving via Google Drive)

Overview Previously, I shared a method for running the NDLOCR app using Google Cloud Platform’s Compute Engine. However, the above method involves somewhat cumbersome procedures and incurs costs. While it is suitable for production environments, it presented a high barrier for small-scale or experimental use. To address this issue, @blue0620 created a method for running the NDLOCR app using Google Colab. https://twitter.com/blue0620/status/1519294332159012864 By using the above notebook, you can easily (with one click from “Runtime” > “Run all”) and freely run OCR. ...

Running the NDLOCR Application Using Google Cloud Platform Compute Engine

Overview This is a memo about running the NDLOCR application published by NDL (National Diet Library) using a virtual machine on GCP (Google Cloud Platform). For details about this application, please refer to the following repository. https://github.com/ndl-lab/ndlocr_cli Creating a VM Instance Access Compute Engine on GCP and click the “Create Instance” button at the top of the screen. Under “Machine configuration” > “Machine family”, select “GPU”. Then for “GPU type”, select “NVIDIA T4”, which is the most affordable option. Set “Number of GPUs” to 1. ...

Added TEI/XML Download Functionality to the "NDL OCR x IIIF" App

I added the ability to download OCR results in TEI/XML format to the app that allows viewing OCR results published in the National Diet Library’s “Next-Generation Digital Library” using an IIIF viewer. https://static.ldas.jp/ndl-ocr-iiif/ Please also refer to the following article about this app. In adding this feature, I updated the UI. The results are divided into “Viewer” and “Data.” For “Viewer,” in addition to the previously provided “Mirador” and “Curation Viewer,” I added “Universal Viewer” and “Image Annotator.” I also added a link to the “Next-Generation Digital Library” and implemented a page called “TEI Viewer” as a simple viewer for TEI/XML files. ...

[Development Guide] I Created an App to View OCR Results Published by the National Diet Library's Next-Generation Digital Library in an IIIF Viewer

Overview I created an app to view OCR results published by the National Diet Library’s “Next-Generation Digital Library” in an IIIF viewer. The usage instructions are summarized in the following article. This time, I will explain how to build the above app. Build Method Backend I used AWS. The system was primarily built using SAM (Serverless Application Model). Creating IIIF Manifests & Curation Lists The flow for generating IIIF manifests and curation lists reflecting the OCR results published by the Next-Generation Digital Library is as follows. ...

An App for Viewing OCR Results from the NDL "Next-Generation Digital Library" in an IIIF Viewer

Overview I created an app for viewing OCR results published on the National Diet Library’s “Next-Generation Digital Library” in an IIIF viewer. You can try it at the following URL. https://static.ldas.jp/ndl-ocr-iiif/ Usage Enter the ID of a material published on the “Next-Generation Digital Library” in the input form. After a short while, buttons for “Mirador” and CODH’s “Curation Viewer” will appear. You can view the OCR results in each viewer. ...