Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Chen, Sanyuan; Wu, Yu; Wang, Chengyi; Liu, Shujie; Chen, Zhuo; Wang, Peidong; Liu, Gang; Li, Jinyu; Wu, Jian; Yu, Xiangzhan; Wei, Furu

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2204

Computer Science > Computation and Language

Title: Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Authors: Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

(Submitted on 27 Apr 2022 (v1), last revised 27 Jun 2022 (this version, v2))

Abstract: Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised learning on speaker-related tasks, e.g. speaker verification (SV), through a series of carefully designed experiments. Our empirical results on the Voxceleb-1 dataset suggest that the benefit of SSL to SV task is from a combination of mask speech prediction loss, data scale, and model size, while the SSL quantizer has a minor impact. We further employ the integrated gradients attribution method and loss landscape visualization to understand the effectiveness of self-supervised learning for speaker recognition performance.

Comments:	Accepted by INTERSPEECH 2022
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.12765 [cs.CL]
	(or arXiv:2204.12765v2 [cs.CL] for this version)

Submission history

From: Sanyuan Chen [view email]
[v1] Wed, 27 Apr 2022 08:35:57 GMT (2439kb,D)
[v2] Mon, 27 Jun 2022 13:49:53 GMT (2463kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2204.12765

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Submission history