Overview

This is a memo about hosting TEI/XML files on S3-compatible object storage. Specifically, we target the mdx I object storage.

https://mdx.jp/mdx1/p/about/system

Background

We are building a web application (Next.js) that loads TEI/XML files and visualizes their content. When the number and size of files were small, they were stored in the public folder, but as these grew larger, we considered hosting them elsewhere.

There are many options for storage locations, but this time we target mdx I’s S3-compatible object storage.

Uploading Files to Object Storage via GUI

There are many ways to upload TEI/XML files to object storage via GUI. Among them, I have previously introduced methods using Cyberduck and GakuNin RDM.

In this case, however, content other than TEI/XML was managed in Drupal. Therefore, we connected Drupal to the object storage so that users could complete everything through Drupal operations.

Connecting Drupal to Object Storage

The following module is used.

https://www.drupal.org/project/s3fs

After installation, select S3 File System from the configuration page /admin/config.

Then register the access key and secret key, along with the S3 bucket name.

Also, in Advanced Configuration Options under Custom Host Settings, enter https://s3ds.mdx.jp.

This completes the connection settings to the object storage.

After that, in the feed settings for each content type, select “S3 File System” as the upload destination.

Also, since TEI/XML files are the upload target in this case, enter xml as the “Allowed extensions.”

As a result, TEI/XML files uploaded through Drupal’s GUI are now stored in mdx I’s object storage.

(Reference) Bulk File Upload Using Drupal’s JSON:API

For the initial registration of TEI/XML files, bulk registration was performed using Python. The following article was helpful for bulk file upload methods using JSON:API.

https://www.drupal.org/node/3024331

As an example, it was achieved with a script like the following.

iiifffcmmmrrrlpppoooaooommmsrrrsddddtttdgteeeeolqAffffrjotodpessebmil#s###ssl#llig#cieuu#f#whhh#rieqonC_oeeeooofesflpriieeeeflunviiliaDlEsAllgLggtGrsllGlRtaaaUssemmindrfneuffioiil_efceoeeehdddppresippei_u.dlt..ngnnljhosct_fcss}:sa=tnafeeeloei:ptmoontdpDpfhUP(i__osegest"oreedadoirrronsfrsprrt_oaRo.eSAsnurgoailrCo{ofll_ffmplsssaspiott:_tlUiJnESereindnffSksk_fff"iefee[[deovnr(ePnStRSlrlsn=e_._Reeit.""".i{lin_=''nepttgtsnsAtOiNWfep_{rrstFnleohCAXclse=l(dCCf=srr(lqeviLNcAO)q=ou"seeo_fskeoc-seenefasooiebiflodl(t_(AaMR:unrn=ssktr.=eancCr(laoitennlr.on"obmfoeBJPtEDefsla{pseoeDsndteSfsfmsilaltteesstFa)vASIis"e,m"oinksRe_eepR_e.e.nefeeqte(id:eUSO_o==t{eCno(epUlrrntFtlDp_=.nnua:fl_rRENEns="osnsnoPfest"-ofRabphttet"edrL_:Nooe:ne_enA.s-:Tk,Utiafe--suFoiUADcsslrt.clsLsp=ToePhnt.aTDtsiutd(RPPr..fesesofe_eoy"kntA.ahrdyis_lpeeeLIOegg.qento)Bsn{paeyLbr,eeps.celn=x)IdeeDultak:=Assepn=p_ayareppoovTa=NettRef-tiSie"p"eBs'ds'ooduarmTneeUs.TuerEo.:l:N,Aemr(.]ssepdupotnnPtUysse_nsioSnob)citlels=ivvAsSp_qU_t"ccnuEad'o=t(=of)e.a((L.Eec=uRcaaaseu_me)piu=aa)gfl""_pR"oeLotptriUey'ordie"sDDBoN:dls}upifdR(a(anl2lt{RRAsAeotkslo_,Lfs)p',0sees(UUStM"gssi_int}ip]0udneBPPE(Ea=i.eecc/of/lflh:c:vlaAA_,p=ngssoavkije:i=ec(fsLLUp_esdtnees_cae{".i__R"l2rtieidnlopafdsrDDcUPLpi0e(o#o._dnat'eseRRSA}ac0sn=nar,atiarfsUUaES/sa:p/P=/pephotsupPPuRSustotavisfi)nt=loAAtNWs"inos2n+pi//ah:nLLhAOe:osks0djolnoces__)MRrnee0.sneocha{eBBED/s/.nl:aos_dtmdf.AA""lejc"opnepeeeeisSS))olso,gi".a/tnrltEEgfooi+,tt{-tsea__i.nknjehts;,ntUUnP"isx,ytauRR?A}esotprfcmsLL_Ssen,veeioe_"}fSs"e}alo}c)/oWs,r/mek"ojrOib{'ni)dsmRoouaeeoaDnsums}nt}eie=a=,h=d=s{pjeF}"erisra/{le/oel{ffsnnsfi.po"eilsod)eeene:lnss/dasea}mi.r"eott}nei"_xc'ctlo}eo""k)ies,data=file_data)

This can be used to upload files to a field such as field_file on content that has already been created.

There may be more appropriate methods, but it can be used as follows.

cccufcfcllluioiliiiilneieeedetlennnednttt==nt..t=.=lg""_uoec<t"pAgtefyflpi_fipioincaleeaC(s8eldl)r0=d_if7p_fe_6a"fint-t<ilto4hcle(kd>oe()ed"n"cnfto(-en)4ntcte0n5tt-y_apt0ey3>pd"e-,fcudueibdb,f0fci2e0l9d",file)

The following environment variables are required.

DDDRRRUUUPPPAAALLL___BUPASASESERS_NWUAORMRLED===

Using from Next.js

TEI/XML files uploaded to the object storage are loaded from applications such as Next.js.

The following library was used successfully.

https://www.npmjs.com/package/@aws-sdk/client-s3

Specifically, it was used as follows.

iie}e}mmx;x;pppccrpc}c}cci}roooooeoo)o)ooferrrnntrnrec};nBK;nnrttttssutsenr,suess(euttrtgdeastcytt!tr{{cncipdcek:cunopxocooecccercorSDnamxnlninerot`eonnc3Osrlmsi:ntsem:xsntoCMtsltetistmmptennlPe=;n':aKAaploenuviacrgtulecnrnntleeropespsycd$st)lrnsn=at=-r:Iec{e;ttevrXeods=ei={T,rensmnac{:ssd=o,reeleseKns}aXGtwrwtspee..awmeDT.=-srywexwaltooDpS1.o:nmai(OcXOaa3'ecGvlitcbumMrsC,nepe.`tojmlPsylsrtS,rneeaeni.soO3cetcn=rFceS.cb_lsettsrn3eejBipnC(eo(tM_nseUeotoaxrmi(aEvscCnn)msm(Sd{yN..tKts;ml)t:DSeCE.eaXT;roP3noTs.nMeisrO_vmeBdLxntIA.mnoDtgrmNCSadd}o:(iaTC3n(ycxnyE_dc?fusmgSS('o.rmtl)nSE{,mtoerT:o_CmrmnietKRaatnxP'EEnn'gtrn,YTds@}),oe__)fa:meIA;owf'idDCrsrXtsCm-oMeetETsmLx<oSodDtXSSk'o/Mb_t@cxLe'KrcxumD,EilmmlosYnile'cegedn)ut(not;m)tmed;-/=nesx>tp'3me,'l{|n;ddonimun'lg;l>on=>th{eservice

It is used with the following environment variables.

SSSS3333____ASEBCENUCCDCERPKSEOESTIT__N=KATEC=YCh_EtIStDSp=_sK:E/Y/=s3ds.mdx.jp

As a result, the following architecture can be achieved.

Future Outlook: Connecting with LEAF-Writer

While we have not prepared an editing environment for TEI/XML files uploaded to Drupal (or rather, mdx I’s object storage) in this case, using the following LEAF-Writer Drupal module could potentially allow TEI/XML file editing and management to be completed within the CMS.

https://gitlab.com/calincs/cwrc/leaf-writer/leaf_writer

Also, the following prototype connecting GakuNin RDM and LEAF-Writer may be a useful reference.

Summary

This article introduced an example of hosting TEI/XML files on S3-compatible object storage. Since there are both advantages and disadvantages to this approach, I hope this article serves as a useful reference when considering architectures suited to your use case.