BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

Kocour, Martin; Cámbara, Guillermo; Luque, Jordi; Bonet, David; Farrús, Mireia; Karafiát, Martin; Veselý, Karel; Ĉernocký, Jan ''Honza''

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2101

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

Authors: Martin Kocour, Guillermo Cámbara, Jordi Luque, David Bonet, Mireia Farrús, Martin Karafiát, Karel Veselý, Jan ''Honza'' Ĉernocký

(Submitted on 29 Jan 2021)

Abstract: This paper describes joint effort of BUT and Telef\'onica Research on development of Automatic Speech Recognition systems for Albayzin 2020 Challenge. We compare approaches based on either hybrid or end-to-end models. In hybrid modelling, we explore the impact of SpecAugment layer on performance. For end-to-end modelling, we used a convolutional neural network with gated linear units (GLUs). The performance of such model is also evaluated with an additional n-gram language model to improve word error rates. We further inspect source separation methods to extract speech from noisy environment (i.e. TV shows). More precisely, we assess the effect of using a neural-based music separator named Demucs. A fusion of our best systems achieved 23.33% WER in official Albayzin 2020 evaluations. Aside from techniques used in our final submitted systems, we also describe our efforts in retrieving high quality transcripts for training.

Comments:	fusion, end-to-end model, hybrid model, semisupervised, automatic speech recognition, convolutional neural network
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
Cite as:	arXiv:2101.12729 [eess.AS]
	(or arXiv:2101.12729v1 [eess.AS] for this version)

Submission history

From: Jordi Luque [view email]
[v1] Fri, 29 Jan 2021 18:40:54 GMT (193kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2101.12729

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

Submission history