Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

Lehečka, Jan; Švec, Jan; Pražák, Aleš; Psutka, Josef V.

doi:10.21437/Interspeech.2022-10439

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Computer Science > Computation and Language

Title: Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

Authors: Jan Lehečka, Jan Švec, Aleš Pražák, Josef V. Psutka

(Submitted on 15 Jun 2022)

Abstract: In this paper, we present our progress in pretraining Czech monolingual audio transformers from a large dataset containing more than 80 thousand hours of unlabeled speech, and subsequently fine-tuning the model on automatic speech recognition tasks using a combination of in-domain data and almost 6 thousand hours of out-of-domain transcribed speech. We are presenting a large palette of experiments with various fine-tuning setups evaluated on two public datasets (CommonVoice and VoxPopuli) and one extremely challenging dataset from the MALACH project. Our results show that monolingual Wav2Vec 2.0 models are robust ASR systems, which can take advantage of large labeled and unlabeled datasets and successfully compete with state-of-the-art LVCSR systems. Moreover, Wav2Vec models proved to be good zero-shot learners when no training data are available for the target ASR task.

Comments:	to be published in Proceedings of INTERSPEECH 2022
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Journal reference:	Interspeech 2022, 1831-1835
DOI:	10.21437/Interspeech.2022-10439
Cite as:	arXiv:2206.07627 [cs.CL]
	(or arXiv:2206.07627v1 [cs.CL] for this version)

Submission history

From: Jan Lehečka [view email]
[v1] Wed, 15 Jun 2022 16:14:37 GMT (30kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.07627

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

Submission history