Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

Sullivan, Peter; Shibano, Toshiko; Abdul-Mageed, Muhammad

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2202

Computer Science > Computation and Language

Title: Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

Authors: Peter Sullivan, Toshiko Shibano, Muhammad Abdul-Mageed

(Submitted on 10 Feb 2022)

Abstract: ASR systems designed for native English (L1) usually underperform on non-native English (L2). To address this performance gap, \textbf{(i)} we extend our previous work to investigate fine-tuning of a pre-trained wav2vec 2.0 model \cite{baevski2020wav2vec,xu2021self} under a rich set of L1 and L2 training conditions. We further \textbf{(ii)} incorporate language model decoding in the ASR system, along with the fine-tuning method. Quantifying gains acquired from each of these two approaches separately and an error analysis allows us to identify different sources of improvement within our models. We find that while the large self-trained wav2vec 2.0 may be internalizing sufficient decoding knowledge for clean L1 speech \cite{xu2021self}, this does not hold for L2 speech and accounts for the utility of employing language model decoding on L2 data.

Comments:	arXiv admin note: substantial text overlap with arXiv:2110.00678
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2202.05209 [cs.CL]
	(or arXiv:2202.05209v1 [cs.CL] for this version)

Submission history

From: Peter Sullivan [view email]
[v1] Thu, 10 Feb 2022 18:13:32 GMT (529kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2202.05209

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Improving Automatic Speech Recognition for Non-Native English with Transfer Learning and Language Model Decoding

Submission history