Auto-Generating English Subtitles and Audio for Videos with Azure OpenAI Whisper + Speech Services

I have summarized how to automatically add English subtitles and English audio to Japanese videos. This uses Azure OpenAI Service’s Whisper and Speech Services. Overview The goal this time is to make a Japanese audio video multilingual as follows: Japanese version: Original video (Japanese audio, no subtitles) English version: English audio + English subtitles Services Used Service Purpose Azure OpenAI Service (Whisper) Translation from Japanese audio to English text Azure Speech Services (TTS) Synthesis from English text to English audio FFmpeg Audio extraction and video merging Procedure 1. Environment Setup Required Tools # b # p r i I e P p n w y s t i t i h n a n o s l s n t l t a a l l F l i l F l b m r p p f a y e f r t g m i h p e o ( e s n m g - a d c o O t S e ) n v r e q u e s t s Azure Configuration (.env) A A A A Z Z Z Z U U U U R R R R E E E E _ _ _ _ O O O O P P P P E E E E N N N N A A A A I I I I _ _ _ _ E A D A N P E P D I P I P _ L _ O K O V I E Y E N Y M R T = E S = y N I h o T O t u _ N t r N = p - A 2 s a M 0 : p E 2 / i = 4 / - w - x k h 0 x e i 6 x y s - x p 0 x e 1 . r o p e n a i . a z u r e . c o m 2. Extract Audio from Video Since the Azure Whisper API has a 25MB file size limit, the audio is compressed and extracted. ...

January 21, 2026 · 11 min · Nakamura