Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Moritz, Niko; Hori, Takaaki; Roux, Jonathan Le

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2107

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Authors: Niko Moritz, Takaaki Hori, Jonathan Le Roux

(Submitted on 2 Jul 2021)

Abstract: Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging for streaming ASR, where each word must be recognized shortly after it was spoken. In this work, we present the dual causal/non-causal self-attention (DCN) architecture, which in contrast to restricted self-attention prevents the overall context to grow beyond the look-ahead of a single layer when used in a deep architecture. DCN is compared to chunk-based and restricted self-attention using streaming transformer and conformer architectures, showing improved ASR performance over restricted self-attention and competitive ASR results compared to chunk-based self-attention, while providing the advantage of frame-synchronous processing. Combined with triggered attention, the proposed streaming end-to-end ASR systems obtained state-of-the-art results on the LibriSpeech, HKUST, and Switchboard ASR tasks.

Comments:	Accepted to Interspeech 2021
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2107.01269 [eess.AS]
	(or arXiv:2107.01269v1 [eess.AS] for this version)

Submission history

From: Niko Moritz [view email]
[v1] Fri, 2 Jul 2021 20:56:13 GMT (93kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2107.01269

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Submission history