Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

Tathe, Aniket; Kamble, Anand; Kumbharkar, Suyash; Bhandare, Atharva; Mitra, Anirban C.

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2403

Computer Science > Computation and Language

Title: Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

Authors: Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

(Submitted on 1 Mar 2024)

Abstract: This research addresses the challenge of training an ASR model for personalized voices with minimal data. Utilizing just 14 minutes of custom audio from a YouTube video, we employ Retrieval-Based Voice Conversion (RVC) to create a custom Common Voice 16.0 corpus. Subsequently, a Cross-lingual Self-supervised Representations (XLSR) Wav2Vec2 model is fine-tuned on this dataset. The developed web-based GUI efficiently transcribes and translates input Hindi videos. By integrating XLSR Wav2Vec2 and mBART, the system aligns the translated text with the video timeline, delivering an accessible solution for multilingual video content transcription and translation for personalized voice.

Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2403.00212 [cs.CL]
	(or arXiv:2403.00212v1 [cs.CL] for this version)

Submission history

From: Anand Kamble [view email]
[v1] Fri, 1 Mar 2024 01:15:45 GMT (5903kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.00212

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Transcription and translation of videos using fine-tuned XLSR Wav2Vec2 on custom dataset and mBART

Submission history