Overview

JPCOAR Schema publishes XML Schema Definitions in the following repository. Thank you for creating the schema and making the data available.

https://github.com/JPCOAR/schema

This article is a memo of trying XML file validation using the above schema. (Since this is my first time doing this kind of validation, it may contain inaccurate terminology or information. I apologize.)

A Google Colab notebook is also prepared.

https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/JPCOARスキーマを用いたxmlファイルのバリデーション.ipynb

Preparation

Clone the repository

cgdit/ccolnotneenth/ttps://github.com/JPCOAR/schema.git

Install the library

pipinstallxsd-validator

Load the XSD file (v1)

fvraolmidxastdo_rva=liXdsadtVoarliidmaptoorrt('X/scdoVnatleindta/tsocrhema/1.0/jpcoar_scm.xsd')

Trying v1

OK Example

<<?j<</xpxxxxddjmcmmmmccplollll::cannnnttovrssssiyae:::::tprrjdjrxle:spcpdsejic=cfi>rpoo"o==Jdcnaha""Pfo=rtrhhC:a"t=ttOrr1p"ttAe>.:hppRs0/t::o"/t/upp/r?uswwc>r:wwelww=..."gwwhri33xtgt..mt/hoolpdurr:cbgg//.///ec12plo90uem90rm/91leJ//.nP0XotC2MrsO/Lg/A2S/1R2c<c./-h/o1sreda/cdmcr"hfa:/e--trmsiieaynts/nslobtteulaa>roxncb-ce/ne_ms"ta#ys"xptseei/r:c/s_1c6.h50e0/m1"a"L>oacrattiicolne=<"/hdtct:ptsy:p/e/>github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
v#alNiodaetrorro.rasssert_valid("/content/ok.xml")

NG Example

Error from placing jpcoar:subject after dc:type?

<<?j<</xpxxxxddjmcmmmmccplollll::cannnnttovrssssiyae:::::tprrjdjrxle<:spcpdsejjic=cfi>rppoo"o==Jdccnaha""Pfoo=rtrhhC:aa"t=ttOrrr1p"ttAe:>.:hppRss0/t::ou"/t/ubpp/rj?uswwce>r:wweclww=t..."gwwhsri33xtugt..mtb/hoolpjdurr:ecbgg/c/.///tec12pSlo90ucem90rhm/91leeJ//.mnP0XoetC2Mr=sO/Lg"/A2S/O1R2c<ct./-h/oh1sredae/cdmcrr"hfa:/"e--tr>msiieaynts/nslobtteu<laa>r/oxncjb-cep/ne_cms"toa#yas"xprtse:ei/sr:cu/s_b1c6j.h5e0e0c/m1t"a">L>oacrattiicolne=<"/hdtct:ptsy:p/e/>github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
validator.assert_valid("/content/ng.xml")

XsdValidationErrorWithInfo: /content/ng.xml: line 9 column 41: cvc-complex-type.2.4.a: Invalid content was found starting with element ‘{"https://github.com/JPCOAR/schema/blob/master/1.0/":subject}’. One of ‘{"https://schema.datacite.org/meta/kernel-4/":version, “http://namespace.openaire.eu/schema/oaire/":version, “https://github.com/JPCOAR/schema/blob/master/1.0/":identifier, “https://github.com/JPCOAR/schema/blob/master/1.0/":identifierRegistration, “https://github.com/JPCOAR/schema/blob/master/1.0/":relation, “http://purl.org/dc/terms/":temporal, “https://schema.datacite.org/meta/kernel-4/":geoLocation, “https://github.com/JPCOAR/schema/blob/master/1.0/":fundingReference, “https://github.com/JPCOAR/schema/blob/master/1.0/":sourceIdentifier, “https://github.com/JPCOAR/schema/blob/master/1.0/":sourceTitle, “https://github.com/JPCOAR/schema/blob/master/1.0/":volume, “https://github.com/JPCOAR/schema/blob/master/1.0/":issue, “https://github.com/JPCOAR/schema/blob/master/1.0/":numPages, “https://github.com/JPCOAR/schema/blob/master/1.0/":pageStart, “https://github.com/JPCOAR/schema/blob/master/1.0/":pageEnd, “http://ndl.go.jp/dcndl/terms/":dissertationNumber, “http://ndl.go.jp/dcndl/terms/":degreeName, “http://ndl.go.jp/dcndl/terms/":dateGranted, “https://github.com/JPCOAR/schema/blob/master/1.0/":degreeGrantor, “https://github.com/JPCOAR/schema/blob/master/1.0/":conference, “https://github.com/JPCOAR/schema/blob/master/1.0/":file}’ is expected.

Fix

Try placing dc:type after jpcoar:subject

<<?j<<</xpxxxxdjdjmcmmmmcpcplollll:c:cannnntotovrssssiayae:::::trprrjdjrxl:e:spcpdsesjic=cfi>urpoo"o==Jbdcnaha""Pjfo=rtrhhCe:a"t=ttOcrr1p"ttAte>.:hppRs0/t::so"/t/uupp/br?uswwjc>r:wweelwwc=...t"gwwShri33xctgt..mht/hoolepdurrm:cbgge//.//=/ec12"plo90Ouem90trm/91hleJ//e.nP0XrotC2M"rsO/L>g/A2S/1R2c<c./-h/o1sred<a/cdmc/r"hfa:j/e--tprmsiiceayntos/nslaobtterulaa>:roxnscb-cue/neb_ms"jta#eys"xcptsteei>/r:c/s_1c6.h50e0/m1"a"L>oacrattiicolne=<"/hdtct:ptsy:p/e/>github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
v#alNiodaetrorro.rasssert_valid("/content/fix.xml")

Summary

Based on the error messages, we were able to fix the XML file.

The Google Colab notebook also includes validation examples targeting JPCOAR Schema Version 2.0.

There may be some inaccurate content, but I hope this serves as a helpful reference for XML file validation.