Whisper

Auto-Generating English Subtitles and Audio for Videos with Azure OpenAI Whisper + Speech Services

I have summarized how to automatically add English subtitles and English audio to Japanese videos. This uses Azure OpenAI Service’s Whisper and Speech Services. Overview The goal this time is to make a Japanese audio video multilingual as follows: Japanese version: Original video (Japanese audio, no subtitles) English version: English audio + English subtitles Services Used Service Purpose Azure OpenAI Service (Whisper) Translation from Japanese audio to English text Azure Speech Services (TTS) Synthesis from English text to English audio FFmpeg Audio extraction and video merging Procedure 1. Environment Setup Required Tools # b # p r i I e P p n w y s t i t i h n a n o s l s n t l t a a l l F l i l F l b m r p p f a y e f r t g m i h p e o ( e s n m g - a d c o O t S e ) n v r e q u e s t s Azure Configuration (.env) A A A A Z Z Z Z U U U U R R R R E E E E _ _ _ _ O O O O P P P P E E E E N N N N A A A A I I I I _ _ _ _ E A D A N P E P D I P I P _ L _ O K O V I E Y E N Y M R T = E S = y N I h o T O t u _ N t r N = p - A 2 s a M 0 : p E 2 / i = 4 / - w - x k h 0 x e i 6 x y s - x p 0 x e 1 . r o p e n a i . a z u r e . c o m 2. Extract Audio from Video Since the Azure Whisper API has a 25MB file size limit, the audio is compressed and extracted. ...

Adding Images to IIIF Manifest Files for Audio Materials

Overview This is a note on trying out the Audio Presentation with Accompanying Image recipe. https://iiif.io/api/cookbook/recipe/0014-accompanyingcanvas/ The following is an example displayed in Clover, where the configured image appears in the player. https://samvera-labs.github.io/clover-iiif/docs/viewer/demo?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json Manifest File Description An example is stored at the following location. https://github.com/nakamura196/ramp_data/blob/main/docs/demo/3571280/manifest.json Specifically, it was necessary to add an accompanyingCanvas to the Canvas as follows. { " " " " } i t d a , d y u c " " " " " ] " p r c i t h w i : e a o d y e i t { } " t m " p i d e " : i p : e g t m " " " ] h o a " h h s i t i t " n n " : t " " d y t { } t C " y h " : : " p e p a : i t " : : e m " " " " } " s n n t C 1 [ " s i t m b , t : 1 g p a 1 0 " : " d y o o " " " " " a / a 5 C s n 0 2 h : " p t d i t h w f r / s 6 a : 2 4 t " : e i y d y e i o g n " . n / a 4 , t A [ " v " " p i d r e a , 0 / s , p n " : a : : e g t m t k 7 a n " s n h t " h h a " a 9 s a , : t " i { " : t " t : m 9 " k t t A o h " : " u 9 : a / a p n n t " : : " r 9 m n t s n " t I 1 h a 9 { u a i : : p m 1 0 " t 1 9 r k o t s a 0 2 i t 9 9 a a n / a " : g 2 4 m p 6 9 1 m P n t p / e 4 , a s . 9 9 u a a i a / " , g : g 9 6 r g k o i n , e / i 9 . a e a n n a / / t 8 g 1 " m " t k j n h , i 9 , u , i a p a u t 6 r n m e k b h . a g u g a . u g 1 " r " m i b i 9 , a u o . t 6 1 r / i h . 9 a r o u g 6 1 a / b i . 9 m r . t g 6 p a i h i . _ m o u t g d p / b h i a _ r . u t t d a i b h a a m o . u / t p / i b d a _ r o . e / d a / i m d a m r o o e t p a / / m a _ m r 3 o / d p a 5 / d a _ m 7 3 e t d p 1 5 m a a _ 2 7 o / t d 8 1 / d a a 0 2 3 e / t / 8 5 m d a c 0 7 o e / a / 1 / m d n c 2 3 o e v a 8 5 / m a n 0 7 3 o s v / 1 5 / " a c 2 7 3 , s a 8 1 5 / n 0 2 7 a v / 8 1 c a c 0 2 c s a / 8 o / n 3 0 m a v 5 / p c a 7 c a c s 1 a n o / 2 n y m a 8 v i p c 0 a n a c _ s g n o s / " y m u a , i p m c n a m c g n a o / y r m a i y p n n _ a n g i n o / m y t a a i a n g n t n e g i o . / o t j a n a p n / t g n p i " o a o , t g n a e / t " i i , m o a n g / e p " a , g e " It may be a bit hard to follow, but here is the code example using iiif_prezi3. The accompanyingCanvas is created via create_accompanying_canvas() and associated with the canvas. ...

IIIF Audio/Visual: Describing Multiple VTT Files

Overview This is a note on how to describe multiple VTT files for Audio/Visual materials using IIIF. Here, we describe transcription text in both Japanese and English as shown below. https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json Manifest File Description An example is stored at the following location. https://github.com/nakamura196/ramp_data/blob/main/docs/demo/3571280/manifest.json Please also refer to the following article. Specifically, by describing them as multiple annotations as shown below, they were correctly processed by the Ramp viewer. ...

Displaying Audio Files with Subtitles in an IIIF Viewer

Overview I had the opportunity to display audio files with subtitles in an IIIF viewer, so this is a memo. The target is “Accents and Intonation of the Japanese Language (Part 2)” published in the National Diet Library Historical Sound Archive. OpenAI’s Speech to text was used. Please note that the transcription results may contain errors. The following is a display example in Ramp. https://ramp.avalonmediasystem.org/?iiif-content=https://nakamura196.github.io/ramp_data/demo/3571280/manifest.json The following is a display example in Clover. ...