Recent Advances in End-to-End Automatic Speech Recognition

Li, Jinyu

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2111

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Recent Advances in End-to-End Automatic Speech Recognition

Authors: Jinyu Li

(Submitted on 2 Nov 2021 (v1), last revised 2 Feb 2022 (this version, v2))

Abstract: Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective.

Comments:	Accepted at APSIPA Transactions on Signal and Information Processing
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2111.01690 [eess.AS]
	(or arXiv:2111.01690v2 [eess.AS] for this version)

Submission history

From: Jinyu Li [view email]
[v1] Tue, 2 Nov 2021 15:49:20 GMT (6878kb,D)
[v2] Wed, 2 Feb 2022 23:38:10 GMT (6908kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2111.01690

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Recent Advances in End-to-End Automatic Speech Recognition

Submission history