Deep Sparse Conformer for Speech Recognition

Wu, Xianchao

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2209

Computer Science > Computation and Language

Title: Deep Sparse Conformer for Speech Recognition

Authors: Xianchao Wu

(Submitted on 1 Sep 2022)

Abstract: Conformer has achieved impressive results in Automatic Speech Recognition (ASR) by leveraging transformer's capturing of content-based global interactions and convolutional neural network's exploiting of local features. In Conformer, two macaron-like feed-forward layers with half-step residual connections sandwich the multi-head self-attention and convolution modules followed by a post layer normalization. We improve Conformer's long-sequence representation ability in two directions, \emph{sparser} and \emph{deeper}. We adapt a sparse self-attention mechanism with $\mathcal{O}(L\text{log}L)$ in time complexity and memory usage. A deep normalization strategy is utilized when performing residual connections to ensure our training of hundred-level Conformer blocks. On the Japanese CSJ-500h dataset, this deep sparse Conformer achieves respectively CERs of 5.52\%, 4.03\% and 4.50\% on the three evaluation sets and 4.16\%, 2.84\% and 3.20\% when ensembling five deep sparse Conformer variants from 12 to 16, 17, 50, and finally 100 encoder layers.

Comments:	5 pages, 1 figure
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2209.00260 [cs.CL]
	(or arXiv:2209.00260v1 [cs.CL] for this version)

Submission history

From: Xianchao Wu [view email]
[v1] Thu, 1 Sep 2022 06:56:11 GMT (611kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2209.00260

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Deep Sparse Conformer for Speech Recognition

Submission history