!

After manual verification, an AI wrote this article.

Introduction

When editing TEI (Text Encoding Initiative) XML, in addition to structural validation of elements and attributes, more complex business rule validation may be needed. This article explains how to combine RELAX NG (RNG) and Schematron to achieve both structural and content validation, using challenges encountered in an actual project as examples.

The Problem to Solve

When editing classical Japanese literary texts in TEI XML, the following requirements arose:

  1. Dynamic validation of ID references: Validate that IDs referenced by corresp attributes actually exist in witness elements within the document
  2. Completion functionality in Oxygen XML Editor: Automatically display ID candidates during editing
  3. Multiple ID reference support: Allow specifying multiple IDs separated by spaces
  4. Restricting references to specific elements: Only allow references to witness element IDs, and error if person element IDs are included

Why RNG + Schematron?

RELAX NG Strengths

  • Element and attribute structure definition
  • Data type specification
  • Basic content model definition

Schematron Strengths

  • XPath-based complex validation rules
  • Cross-reference checks within documents
  • Custom error message provision

Combining these two enables strict validation from both structural and content perspectives.

Implementation Examples

1. Basic RNG Schema Structure

<<?g/xr<<<<<gma!s!s/!rlm-c-t<s-am-h-art-mva:reamerSnEtfrSarcsm>ttrsxxxdnhbn>r>immmasepeauolllt=mrdmcnnnna"aeet=sssthtfS=u"=::ytric"r1"asptoxhTa.h=cepn=eEl0t"hL:"mI"th=i/nta"dpt"b/aet/ee:thrwmir>fn/ptawe"oic:trwsnnr/py.puide/:=tarrtilr/"eciuinae/hie=logxlpt-"en=nautcdhs"gxrp.etbU.nl:cthyTog.rlpeFr./ga:rR-gcw/r/eN8rlwna/G"ngcwstw?s../iw>now1owssr3.n.t/g.0trco"eudr>icmsg-tpd/cual2.rt/0oeis0r/bc1g1ih//.leXn0imMs"taL/ytS1/rc.aoh0nne"n"m/oa>t-adtaitoantsy/p1e.s0""

2. ID Definition and Use of anyURI Type

Use the anyURI type to achieve auto-completion in Oxygen XML Editor:

<<<<!d/!d/-e<d-e<d-fe/e-fe/eilefil<<efWneoliBnea/tliiemn<enaemt<<aeenteeenmeseta/l/txmennnOl<<eee>ennr:RIIailtte>eatrea/tlOnatidenn:soir/nsmMmt<aeerttmboftdtn<osi>tsenoetdtxmM>eenuceeOo>edntb>=arnratteox=aturrxcOae>ul"metitr/nrt"memenyurtOtile>bai>teleenagmMaresi=nub>>re=nncleeoM>ts"attuem"atennrtotlmeyta"lmar,teyrWieped>eettea>peis=ne>im=iofate>tt"a=n""oei="Wwm"g>cnwrlo">iieIo>ieinatt=Drtns>n"n""rncty>ex/eeeUsm>ssoRslpsifI":"en">i>sx/dIm>"Dl>R:EiFdfwoirtmhat#wiisthdi#splayed

Key points:

  • data type="ID" guarantees uniqueness
  • data type="anyURI" allows internal references with #
  • The list element allows space-separated multiple values

3. Advanced Validation with Schematron

<s/c<<shss/c:cc<<<<<<<shphhsss!s/!s/c:a::ccc-cTAs-cTD"shpttrhhh-hhvc-hhe/c:atiu::::eah:ets>hrtetllllSai:Eret:utrleeeehsclareccrfr'rlenetttosoasrpotioe,eer>cuerbsoorenrtp>niWonnnlrrlerrrdgu'o>dinaaadteerte-$rr=ttmmmstispjtnt"neeeeotpw>ftpeoo>wex===neierikiteist"""lsatpsasnefhlts=llcyttnetto(nesn"iio=ter=tn(neeItssrr"rss"rissDettrees)isoss)iInt$siWPefva"bnoa"bDat)-R:ieseetssuImtssus$ro,reltrprriturtDIeiturt:ctkefeIsTeysaboesDsaboeosefemdoonfrsl:s$frsl<r-ner[snkc$ittestittecsrwre@"Ieetesr=h#aoesr=oceiencdnos-i"o<rks-i"nhstncovsswkwneuseewnet:phcera""ie(igrlcn(igravT(erltnt(rdhit(riao$sVeuvvnh$o:nih$onlkt"aseaaei(trovcn(trsueo>lp=llsn$o"nal$o"enki]"uustk>llu$tk>p-sed"/eee$oeyudcoeeona>/==sckneeoknrf,tt""oe,r-dre,sie/trneornos'oi/or,2ffe,2ne#n:tke)es)l'<lees'rsp'Ie)/iinp#=eeT#=Dcss:iT'nlo'stactlzo)$cek)$.=nhWieklecel"d:is(eaitnaitttnnnsw=snssi/Posdti"dtutterWtsPblerminteseisaterrt>:olIsisrwnidsnoii/zsgnntteI-Igne-Djd(eissos$s:p.itspano/ec(k@re$exs(lnmo@i,lncs:/ot2i@rW)dxri"met=/lsI>:pd$i)sld,,i"s/'t>\,Pse+#r''s))o""n//I>>ds)

Key points:

  • Define variables with sch:let and dynamically retrieve values with XPath
  • Parse multiple ID references with tokenize()
  • sch:assert raises errors when conditions are not met
  • sch:report raises errors when conditions are met
  • role="error" specifies the error level (warning and info are also available)

4. Actual Usage Example

<<<<!??T/-xxET-mmIElsls<<IU-c-cxt/t/>smhmhmetetaoeoeliexegdmdmnH<<it<xeeaeasel/l/H>b/tltlt=ailileob>iyy"dsisiadonhphphet<<st<sdy<<dreretrWwwtPp/te>a/a/yXenent>iiiWeepPrpapa>Mfsfsptttirree>ppppL====:>nntss<rr><<<p><<<p""""/ee>oopss!lr>!lr>dshsh/ssnneoo-ed-edoctctwss>rnn-mg-mgchthtwxs>>>uepepwxxmNCccEcAmm:m:.mmlaoooroleaatll:mrrrrrtn././e::ierrroretrrrpiiid>eeerernenu-dd=Pcsssnglgrc=="etppepa"a"l.""ar==x=tx.oaibse""a"itntoraicox##m#vygycgai"naaapaep.pl/"">maalaeoecn>>Apaaear=r=.sWWBl":e"g"o/iiCe#>#aa/ar1tt<:iAiadpnpg.nn/ilnbipsp/0eepritccnl/ld"ssee"el"gisis>ssrf>ru><ctcdseMndM/aralAINraaiartut/<<aeitnidicis//mnnigngotocwwecv>nunhii>itept/r/ettneeexexmnngxrrxm/maeetestl1ltsso<ao<"."rssn/dn/0o>>llil"nyene?"mgm>?w><>>i/trndegs>ses

Implementation Notes

1. XPath 2.0 Syntax

Pay attention to the for expression syntax in XPath expressions within Schematron:

<l<l!e!e-tfr-t-oe-$rtlr$CiueeWion$rttinrvtnulvrao$rlaelkinlciedcitdniad:fu:i=s:=n(e=s$($uiaflrcbdnoeeosrttrt=eurrr$$rei$rtinsnvoodpgarkiT(le:fo$in=ktd(eoIis$nkdnuisesbdn)$s,ct=tor2hri$)ernvnegas(lp$iTtdooIekkdleessnn)es,t$2ht)eonkenelse$token

2. IDREF vs anyURI

  • IDREF type: Cannot include #, limiting completion in Oxygen
  • anyURI type: Allows values with #, and Oxygen automatically provides ID completion

3. Schematron’s role Attribute

  • role="error": Red error marker
  • role="warning": Yellow warning marker
  • role="info": Blue information marker

Application Examples

Complex Cross-Reference Validation

<s/c<<<<sh!s/!s/c:-c<s-c<<shp-hs/c-hss/c:a:cash:ccrshptarhpc:rrhhdc:atpu:phrdu::ghrtepla:uglla:utresealeesealeneslseetslserlcees>lcees>nieormeeonrme>dmntermnater=etntetmnt"nett>neett>ctxetx=e'rtsm't"ssom=tus=ltsu"=s"e=csst"tctm"o-tecoeCnrriohriooreh:uar:rtefaanverr(sevptesde@prep(pgsce"te[pomne>exm@"rucxiaucrsea:csovetsclttras"telrlpb>lmyneuey)ose=otp=do=n]"$ined".lfe1u>.ef"lp/mel>eltCremieoemcirnea:rteltleleeesfemmprmel/)oene@"mntmc>tolerelrmeemseepln"et/m'>esnt's

Conditional Required Attributes

<s/c<shs/c:c<<shph!s/c:a:-cwshptr-hhc:atu:ehrtelwan:utrehsalenesasercnets>niorte>dnatrr=ttit"ettb>cxreuotistn=bted"u=itt"mteemuiiaso:mttnducaashblttee-esa[b(st@e@ptwwerhihcieneibnnfu]I,it"See>Ods"f\i>odnr{m4Ya}Yt-Y\Yd-{M2M}--D\Dd{f2o}r$m'a)t">

Summary

By combining RELAX NG and Schematron:

  1. Separation of structural and content validation: Design leveraging each tool’s strengths
  2. Dynamic validation rules: Flexible validation based on document content
  3. Editor support: Advanced editing assistance in Oxygen XML Editor and similar tools
  4. Clear error messages: Custom messages in any language

Especially for editing documents with complex structures like TEI XML, this combination becomes an extremely powerful tool.

References


The complete schema code introduced in this article is from an actual project. I hope it serves as a reference for those facing similar challenges.