Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Hernandez, Steven M.; Zhao, Ding; Ding, Shaojin; Bruguier, Antoine; Prabhavalkar, Rohit; Sainath, Tara N.; He, Yanzhang; McGraw, Ian

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2303

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Authors: Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw

(Submitted on 15 Mar 2023)

Abstract: Continued improvements in machine learning techniques offer exciting new opportunities through the use of larger models and larger training datasets. However, there is a growing need to offer these new capabilities on-board low-powered devices such as smartphones, wearables and other embedded environments where only low memory is available. Towards this, we consider methods to reduce the model size of Conformer-based speech recognition models which typically require models with greater than 100M parameters down to just $5$M parameters while minimizing impact on model quality. Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors. We propose model weight reuse at different levels within our model architecture: (i) repeating full conformer block layers, (ii) sharing specific conformer modules across layers, (iii) sharing sub-components per conformer module, and (iv) sharing decomposed sub-component weights after low-rank decomposition. By sharing weights at different levels of our model, we can retain the full model in-memory while increasing the number of virtual transformations applied to the input. Through a series of ablation studies and evaluations, we find that with weight sharing and a low-rank architecture, we can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean respectively with a $5$M parameter model.

Comments:	Accepted to IEEE ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2303.08343 [eess.AS]
	(or arXiv:2303.08343v1 [eess.AS] for this version)

Submission history

From: Steven M. Hernandez [view email]
[v1] Wed, 15 Mar 2023 03:21:38 GMT (20kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2303.08343

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech Recognition Models

Submission history