IIIF Audio/Visual: Describing Multiple VTT Files

Overview This is a note on how to describe multiple VTT files for Audio/Visual materials using IIIF. Here, we describe transcription text in both Japanese and English as shown below. https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json Manifest File Description An example is stored at the following location. https://github.com/nakamura196/ramp_data/blob/main/docs/demo/3571280/manifest.json Please also refer to the following article. Specifically, by describing them as multiple annotations as shown below, they were correctly processed by the Ramp viewer. " a n n o t a ] t i { } o n " " " ] s i t i " d y t { } { } : " p e , : e m " " " } " " } " " " " } " " } " [ " s i t l , m b , t i t l , m b , t " : " d y a " ] o o " " " " } a d y a " ] o o " " " " } a h : " p b j t d i t f l r " p b j t d i t f l r t " : e e a " i y d y o a " ] g : e e a " i y d y o a " ] g t A [ " l " 日 v " " p r b j e " l " E v " " p r b j e p n " : " : 本 a : : e m e a " t " : " : n a : : e m e a " t s n h : 語 t " a l " 日 " h : g t " a l " E " : t " [ i { " : t " : 本 : t " [ l i { " : t " : n : t t A { ( o h " : 語 t A { i o h " : g / a p n m n t " : [ " p n s n t " : [ l " n t s n a " t T { ( h s n h " t T { i h a i : c : p e " m t : : p e " s t k o t h s x t a t t ( s x t h t a n / a i " : t e c p / a m " : t e p m P n t n s / " x h s n t a s / " x ( s u a a i e u / , t i : a i c u / , t m : r g k o - p n / n / k o h p n / a / a e a n g p a v e / a n i p a v c / 1 " m " e l k t - n m " n l k t h n 9 , u , n e a t g a u , e e a t i a 6 r e m m " e k r - m m " n k . a r e u , n a a g e u , e a g 1 a n r e m 1 e n r - m i 9 t t a r u 9 n t a g u t 6 e i 1 a r 6 e i 1 e r h . d n 9 t a . r n 9 n a u g ) g 6 e 1 g a g 6 e 1 b i " " . d 9 i t " . r 9 . t , g ) 6 t e , g a 6 i h i " . h d i t . o u t g u ) t e g / b h i b " h d i r . u t . u ) t a i b h i b " h m o . u o . u p / i b / i b _ r o . r o . d a / i a / i a m r o m r o t p a / p a / a _ m r _ m r / d p a d p a d a _ m a _ m e t d p t d p m a a _ a a _ o / t d / t d / d a a d a a 3 e / t e / t 5 m d a m d a 7 o e / o e / 1 / m d / m d 2 3 o e 3 o e 8 5 / m 5 / m 0 7 3 o 7 3 o / 1 5 / 1 5 / c 2 7 3 2 7 3 a 8 1 5 8 1 5 n 0 2 7 0 2 7 v / 8 1 / 8 1 a c 0 2 c 0 2 s a / 8 a / 8 / n 3 0 n 3 0 p v 5 / v 5 / a a 7 c a 7 c g s 1 a s 1 a e / 2 n / 2 n / a 8 v a 8 v 2 n 0 a n 0 a " n . s n _ s , o v " o e " t t t n a t a . t " t v i , i t o o t n n " / / , w w e e b b v v t t t t " / , 2 " , Note that in Clover, the two transcription texts were displayed consecutively. https://samvera-labs.github.io/clover-iiif/docs/viewer/demo?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json (Reference) Creating English Transcription Text For creating the English transcription text, the following program was used. This is an example using the GitHub version of Whisper. https://github.com/openai/whisper d d d e e e f f f f " h m s r w w t m r w r o " o i e e r i r o e r e r " u n c t i t a d s i t m C r u o u t h n e u t u a o s t n r e f f s l l e r t n e d n _ o i o l t _ n _ v = s s v p l r a = v t e f t e e t = t r i r i = = " t n . i s e t f e w t e m t n { ( ( w , t n e i ( h m ( s e s t i s h t f r a d x l i i o r u s ( n e o r i i s r t e n s d e l t t s t c u a l t e t = . p p e s t a i e ( o r n e e g = w u e l u m m c ( n s s _ ( m = f r t r . l p e o s d : c p " e o s i _ . t t ( n e s 0 r a W n f r e t p l r , s i d c 2 i t E t o m g e a o a e n s o % } p h B r a m ( t a n o c n : t , V i m t e f h d s u o s d 6 { i T n a _ n " , _ c t n e s 0 m o ' T t t t { m r p d c i n w e _ i [ s o o i u s o 3 % n , ' n n t m ' t u d b t ) n 6 u ) u i e t a t e e _ : d 0 3 t f n m m s e r p l ( p s 0 6 e i a " e e t x t u ( i a ) 0 s l s ) r s a t } t ' n t t 0 : e a t m ' _ m p h o ) 0 _ f t a p ] p e u ) 2 p i e m ( . a d t a } a l ( p s s t i _ : t e t ( e t h u p f { h : r s g r { , m a o 6 s ) a e m i e ' t r 0 e : n g e p n v ) h m ) c s m n ( d e , a o c e t ) } r t n r n [ \ b v t d i t ' n o e e s p [ e { s r d : t ' n t e b 0 i s d e = o s 6 o t ' x F s t . n a ] t a e r 3 [ r ) } l = i f ' t \ s v n } s ' n e e g " e ] \ ) r g ) n : b ' m " o H e ) s H n e : t # , M s M ' { l : ] i a S ) n S : + g . u m 1 a m } g m \ e ' n = . " " j " a " " , t a s k = " t r a n s l a t e " ) Initially, I tried translation using the API version of Whisper as follows, but it output in Japanese and I was unable to successfully create English text. t r a n s m f r c o i e r d l s i e e p p l = o t = a n " u s = w d e h i _ c i o f l s _ o i p f r e e i m n r l a t - e t . 1 , = a " " u , v d t i t o " . , t r a n s l a t i o n s . c r e a t e ( Summary I hope this is helpful for describing multiple transcription text and subtitle files.

Overview#

Manifest File Description#

(Reference) Creating English Transcription Text#

Summary#

Overview

Manifest File Description

(Reference) Creating English Transcription Text

Summary