Overview

I had the opportunity to display audio files with subtitles in an IIIF viewer, so this is a memo.

The target is “Accents and Intonation of the Japanese Language (Part 2)” published in the National Diet Library Historical Sound Archive. OpenAI’s Speech to text was used. Please note that the transcription results may contain errors.

The following is a display example in Ramp.

https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json

The following is a display example in Clover.

https://samvera-labs.github.io/clover-iiif/docs/viewer/demo?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json

The following is a display example in Aviary. Unfortunately, with the manifest file format used this time, the transcription text could not be displayed.

https://iiif.aviaryplatform.com/player?manifest=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json

Below, I introduce how to create these manifest files.

Preparing the mp4 File

Obtain the mp4 file referring to the following article.

Creating the VTT File

Perform transcription using the OpenAI API.

fcatwrlurioidatmeinhnosmfrfot_coieoipfrdlsple=iieepeenlpl=on.aOet=an(wip="usore=wdeuiinohi_ttmApciofpepIels_ou(o(nipfrttra(eeim_rtponrlavaiut-ettnO_t.1,=tspkpa""_ceeuu,vprnytdtaiA=_ittpIomo"htsp.),).4tg_r"epawtan"ets,nhcv,re(in""pcOrtoPbidE"oiN)nnAsgI.=_c"AruPetIaf_t-Ke8E("Y)")a)sfile:

Creating the Manifest File

The following program (incomplete code) creates the manifest file.

ffdcdmcaaaac#vvvvcwrreouaannnnattttaioofnrnnnnnnnAttttntmmfaivoooovd____vhgwitfa___adbaimbtlaaafimeigiesbp=psondooaannso.iott.osoaa.Vdn=tdrbnn.pwiv_hcnt=dgAgaTyofiygeooaerfivro=yenedT"v=el__nni_eiVeng=mn.d=={avtppn(tppditfea==oa_Uptt==aaooeryeduitMntdiRRAritcggtu(e.oerg_aiRAadtLeneo_a"eeatmze_onsvnfent_esnfnbnW.tpaiddF[iiesniimooi=ove=aiun3iuiv'dfsooot(utx"dabdotitrliheetutnmbteara}sysVAdn_fioaedeos.ra(ooamnct/u,.Tn_spemrtCel_tmctitdr(neicpiTniaspilopd(aeidiygaoIoapdot=ttoioi.euikIo=v=en_tnnl,Tteh.rmnpdrrdetnfaatnpe(veram[,jtp((usa=_eP"tn=oamamat(vsoffr.tfcma{inc)g(senivt"oMriiaai"a(tfdgpooaei/nsottwnatlltuo{niyouern_n)datcnt_"(neeitnpvdprr(e=bv=nirP_a)iiVnnoo(ra=emaif"oavnniaannfiaan_mesm=atdipdstogpgnnadedmmfpf(p"ti=xay.tt"tenoseseeei4ii4S=of}i,i_a,(o_nto))e_xd_o"n"/ndut(i)pft,F:lp}=uua={ct)rimda:=iada/frnudpailoa=g2Alsstm"lddurnn,ncfe)ne.ha{,"irevg/h"])nCvA)np,oafa"twi{oliuir/tis,yenptidtfemix/pberapeoefpo}pev-etoLsi4n/a=tgfi:atx")cg"teion.},aeT"nxngj/n/e,e}P'scaxr/a]oaantacg.nnsn"taea"v/o,en,u,aptdvtsaaf)aAol"gto"sn_a,eir/nlb"ompoaed)naatnlu"tgag=r,=etla"/i=att2obie"n"eox),jlnta)=/R"dveutsrtoa"ut)ricoenI)tem,config

The iiif-prezi3 library is used. Please also refer to the following article.

Summary

We hope this serves as a useful reference for applying IIIF to video and audio content.