Ultra Fast Speech Separation Model with Teacher Student Learning

Chen, Sanyuan; Wu, Yu; Chen, Zhuo; Wu, Jian; Yoshioka, Takuya; Liu, Shujie; Li, Jinyu; Yu, Xiangzhan

doi:10.21437/Interspeech.2021-142

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2204

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Ultra Fast Speech Separation Model with Teacher Student Learning

Authors: Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

(Submitted on 27 Apr 2022)

Abstract: Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder layers, which hinders its deployment on edge devices. A small Transformer model with fewer encoder layers is preferred for computational efficiency, but it is prone to performance degradation. In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning). We introduce layer-wise T-S learning and objective shifting mechanisms to guide the small student model to learn intermediate representations from the large teacher model. Compared with the small Transformer model trained from scratch, the proposed T-S learning method reduces the word error rate (WER) by more than 5% for both multi-channel and single-channel speech separation on LibriCSS dataset. Utilizing more unlabeled speech data, our ultra fast speech separation models achieve more than 10% relative WER reduction.

Comments:	Accepted by interspeech 2021
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
DOI:	10.21437/Interspeech.2021-142
Cite as:	arXiv:2204.12777 [eess.AS]
	(or arXiv:2204.12777v1 [eess.AS] for this version)

Submission history

From: Sanyuan Chen [view email]
[v1] Wed, 27 Apr 2022 09:02:45 GMT (151kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2204.12777

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Ultra Fast Speech Separation Model with Teacher Student Learning

Submission history