Transcription Project — automation of creating transcriptions from video
This is a Python script designed for the automatic extraction of audio from video files and subsequent transcription using Vosk, one of the most accurate speech recognition models. The project aims to process video lectures, allowing for the automatic generation of text transcriptions for the creation of educational materials.
Functionality:
1. Extracting audio tracks from video files.
2. Converting audio files to mono format with a frequency of 16000 Hz for better recognition.
3. Full transcription of audio to text.
4. Detailed logging of all stages of the process.
5. Deleting temporary files to save space on the server.
Key technologies:
• Vosk: for automatic transcription.
• MoviePy: for extracting audio tracks from video.
• Pydub: for processing and normalizing audio files.
• TQDM: for displaying processing progress.
Resolved tasks and challenges:
• The audio quality issue was resolved by converting to mono and normalizing the frequency.
• High server load due to large video volumes was addressed by automating the deletion of temporary files after transcription.
• Performance optimization through the use of a progress bar to track the current status.
Results:
This project provided the client with a tool for the quick and automatic creation of lecture transcriptions. This significantly reduced the time for video processing and allowed for the provision of ready text materials for further use.
Tags (hashtags):
#python #transcription #speech-to-text #audioextraction #automatedworkflow #vosk #pydub #moviepy #audioprocessing #audiotranscription
Functionality:
1. Extracting audio tracks from video files.
2. Converting audio files to mono format with a frequency of 16000 Hz for better recognition.
3. Full transcription of audio to text.
4. Detailed logging of all stages of the process.
5. Deleting temporary files to save space on the server.
Key technologies:
• Vosk: for automatic transcription.
• MoviePy: for extracting audio tracks from video.
• Pydub: for processing and normalizing audio files.
• TQDM: for displaying processing progress.
Resolved tasks and challenges:
• The audio quality issue was resolved by converting to mono and normalizing the frequency.
• High server load due to large video volumes was addressed by automating the deletion of temporary files after transcription.
• Performance optimization through the use of a progress bar to track the current status.
Results:
This project provided the client with a tool for the quick and automatic creation of lecture transcriptions. This significantly reduced the time for video processing and allowed for the provision of ready text materials for further use.
Tags (hashtags):
#python #transcription #speech-to-text #audioextraction #automatedworkflow #vosk #pydub #moviepy #audioprocessing #audiotranscription