Overview

At the annual conferences of Digital Humanities and The Japanese Association for Digital Humanities (JADH), it is common to use a tool called dhconvalidator to convert DOCX or ODT files into DHC files for submission.

https://github.com/ADHO/dhconvalidator

This article is a note for understanding this format.

Examining the Contents

DHC files are described as follows.

This is essentially a ZIP archive containing their original OCT/DOCX file, an HTML rendering and an XML-TEI rendering, plus a folder with the image files, properly renamed).

Therefore, let’s extract a .dhc file created using the dhconvalidator mentioned above.

unzipnakamura.dhc

As a result, the files were extracted as follows.

In addition to the original docx, it contained an XML file converted to TEI, and an HTML file. The HTML file is displayed in the browser as follows.

To check the contents of DHC files created by co-authors or others, it seems best to extract them as shown above and check the HTML file.

!

I haven’t investigated, but there may be viewers specifically designed for viewing DHC files.

(Reference) Conversion Mechanism

It is described as follows.

The DHConvalidator works together with the conference management tool ConfTool and uses TEIGarage (formerly known as OxGarage) to do the bulk conversion.

TEIGarage is briefly covered in the following article.

In DHConvalidator, the conversion appears to be performed in the following file.

https://github.com/ADHO/dhconvalidator/blob/master/src/main/java/org/adho/dhconvalidator/conversion/Converter.java

The conversion from docx (or odt) to xml appears to be performed in the following section.

OSZxtiGrpdaiRornnenaegsetgwuwhexleCOmtZoxoxlixmOnGFzpGlxvaiiRaFGerlpercttiaraeRsaooolrsgNeugnTTeaieasleveeNgoCmutCeiiaenoel(orCCmCntntooeclv=venn)oie=edvv;nercrIeevnsosnrretimipssropouiisonuntooixCtCDnnoGlelaPPnaiditaareFeattanin,hhgtlt,.e(e.gConceoxaotnGmnDvaeveerefra+rasgtuie"(loB.tnaxPCsmrleloiU"peR;enLrt)t;i=es()),

Subsequently, the conversion from XML to HTML appears to be performed in the following section.

DocduomoextnGhtaercCCxaooocHgnnnotevvvnmCeeevlorrreDnsssroviiisceoooirnnno=sRPPnieaaostttnuhhoCl..ltTTHiDEETeoIIMnc__LtBTT.uOOvcf__iofXXaneHHvrTTOe.MMxrtLLGto,.aTBgroyeaDttgoeDecAeurfmraeaunylt(t()P,roperties());

This confirms that it is built on top of the TEI/XML ecosystem.

Summary

I hope this serves as a helpful reference for understanding the DHC format, DHConvalidator, TEIGarage, and related tools.