Ensemble knowledge distillation of self-supervised speech models

Huang, Kuan-Po; Feng, Tzu-hsun; Fu, Yu-Kuan; Hsu, Tsu-Yuan; Yen, Po-Chieh; Tseng, Wei-Cheng; Chang, Kai-Wei; Lee, Hung-yi

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2302

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Ensemble knowledge distillation of self-supervised speech models

Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

(Submitted on 24 Feb 2023)

Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective. On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. The experimental results show that our method improves the performance of the distilled models on four downstream speech processing tasks, Phoneme Recognition, Speaker Identification, Emotion Recognition, and Automatic Speech Recognition in the hidden-set track of the SUPERB benchmark.

Comments:	Accepted by ICASSP 2023
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2302.12757 [eess.AS]
	(or arXiv:2302.12757v1 [eess.AS] for this version)

Submission history

From: Kuan-Po Huang [view email]
[v1] Fri, 24 Feb 2023 17:15:39 GMT (414kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2302.12757

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Ensemble knowledge distillation of self-supervised speech models

Submission history