I created a notebook that converts Pascal VOC format XML files to COCO format JSON files and visualizes the contents of the NDL-DocL Dataset (Document Image Layout Dataset) published by NDL Lab.
https://github.com/nakamura196/ndl_ocr/blob/main/NDL_DocLデータセット(資料画像レイアウトデータセット)の変換と可視化.ipynb
By opening the above notebook and pressing “Runtime” > “Run all cells,” you can perform the conversion and visualization.
By using the “/content/img” folder and “/content/dataset_kotenseki.json” file created after execution, you can use the data in machine learning programs that require COCO format data.
I hope this is helpful.