Overview# I had the opportunity to display audio files with subtitles in an IIIF viewer, so this is a memo.
The target is “Accents and Intonation of the Japanese Language (Part 2) ” published in the National Diet Library Historical Sound Archive. OpenAI’s Speech to text was used. Please note that the transcription results may contain errors.
The following is a display example in Ramp.
https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json
The following is a display example in Clover.
https://samvera-labs.github.io/clover-iiif/docs/viewer/demo?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json
The following is a display example in Aviary. Unfortunately, with the manifest file format used this time, the transcription text could not be displayed.
https://iiif.aviaryplatform.com/player?manifest=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json
Below, I introduce how to create these manifest files.
Preparing the mp4 File# Obtain the mp4 file referring to the following article.
Creating the VTT File# Perform transcription using the OpenAI API.
f c a t w r l u r i o i d a t m e i n h n o s m f r f o t _ c o i e o i p f r d l s p l e = i i e e p e e n l p l = o n . a O e t = a n ( w i p = " u s o r e = w d e u i i n o h i _ t t m A p c i o f p e p I e l s _ o u ( o ( n i p f r t t r a ( e e i m _ r t p o n r l a v a i u t - e t t n O _ t . 1 , = t s p k p a " " _ c e e u u , v p r n y t d t a i A = _ i t t p I o m o " h t s p . ) , ) . 4 t g _ r " e p a w t a n " e t s , n h c v , r e ( i n " " p c O r t o P b i d E " o i N ) n n A s g I . = _ c " A r u P e t I a f _ t - K e 8 E ( " Y ) " ) a ) s f i l e :
Creating the Manifest File# The following program (incomplete code) creates the manifest file.
f f d c d m c a a a a c # v v v v c w r r e o u a a n n n n a t t t t a i o o f n r n n n n n n n A t t t t n t m m f a i v o o o o v d _ _ _ _ v h g w i t f a _ _ _ a d b a i m b t l a a a f i m e i g i e s b p = p s o n d o o a a n n s o . i o t t . o s o a a . V d n = t d r b n n . p w i v _ h c n t = d g A g a T y o f i y g e o o a e r f i v r o = y e n e d T " v = e l _ _ n n i _ e i V e n g = m n . d = = { a v t p p n ( t p p d i t f e a = = o a _ U p t t = = a a o o e r y e d u i t M n t d i R R A r i t c g g t u ( e . o e r g _ a i R A a d t L e n e o _ a " e e a t m z e _ o n s v n f e n t _ e s n f n b n W . t p a i d d F [ i i e s n i i m o o i = o v e = a i u n 3 i u i v ' d f s o o o t ( u t x " d a b d o t i t r l i h e e t u t n m b t e a r a } s y s V A d n _ f i o a e d e o s . r a ( o o a m n c t / u , . T n _ s p e m r t C e l _ t m c t i t d r ( n e i c p i T n i a s p i l o p d ( a e i d i y g a o I o a p d o t = t t o i o i . e u i k I o = v = e n _ t n n l , T t e h . r m n p d r r d e t n f a a t n p e ( v e r a m [ , j t p ( ( u s a = _ e P " t n = o a m a m a t ( v s o f f r . t f c m a { i n c ) g ( s e n i v t " o M r i i a a i " a ( t f d g p o o a e i / n s o t t w n a t l l t u o { n i y o u e r n _ n ) d a t c n t _ " ( n e e i t n p v d p r r ( e = b v = n i r P _ a ) i i V n n o o ( r a = e m a i f " o a v n n i a a n n f i a a n _ m e s m = a t d i p d s t o g p g n n a d e d m m f p f ( p " t i = x a y . t t " t e n o s e s e e e i 4 i i 4 S = o f } i , i _ a , ( o _ n t o ) ) e _ x d _ o " n " / n d u t ( i ) p f t , F : l p } = u u a = { c t ) r i m d a : = i a d a / f r n u d p a i l o a = g 2 A l s s t m " l d d u r n n , n c f e ) n e . h a { , " i r e v g / h " ] ) n C v A ) n p , o a f a " t w i { o l i u i r / t i s , y e n p t i d t f e m i x / p b e r a p e o e f p o } p e v - e t o L s i 4 n / a = t g f i : a t x " ) c g " t e i o n . } , a e T " n x n g j / n / e , e } P ' s c a x r / a ] o a a n t a c g . n n s n " t a e a " v / o , e n , u , a p t d v t s a a f ) a A o l " g t o " s n _ a , e i r / n l b " o m p o a e d ) n a a t n l u " t g a g = r , = e t l a " / i = a t t 2 o b i e " n " e o x ) , j l n t a ) = / R " d v e u t s r t o a " u t ) r i c o e n I ) t e m , c o n f i g The iiif-prezi3 library is used. Please also refer to the following article.
Summary# We hope this serves as a useful reference for applying IIIF to video and audio content.