DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Chang, Heng-Jui; Yang, Shu-wen; Lee, Hung-yi

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2110

Computer Science > Computation and Language

Title: DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Authors: Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee

(Submitted on 5 Oct 2021 (v1), revised 6 Oct 2021 (this version, v2), latest version 28 Apr 2022 (v4))

Abstract: Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, DistilHuBERT required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech.

Comments:	Submitted to ICASSP 2022
Subjects:	Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2110.01900 [cs.CL]
	(or arXiv:2110.01900v2 [cs.CL] for this version)

Submission history

From: Heng-Jui Chang [view email]
[v1] Tue, 5 Oct 2021 09:34:44 GMT (381kb,D)
[v2] Wed, 6 Oct 2021 15:51:03 GMT (381kb,D)
[v3] Mon, 24 Jan 2022 07:33:47 GMT (381kb,D)
[v4] Thu, 28 Apr 2022 02:13:09 GMT (382kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.01900v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Submission history