Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Huang, Sung-Feng; Chuang, Shun-Po; Liu, Da-Rong; Chen, Yi-Chen; Yang, Gene-Ping; Lee, Hung-yi

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2010

Computer Science > Sound

Title: Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Authors: Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-yi Lee

(Submitted on 29 Oct 2020 (this version), latest version 22 Aug 2021 (v3))

Abstract: Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.

Comments:	submitted to ICASSP2021
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2010.15366 [cs.SD]
	(or arXiv:2010.15366v1 [cs.SD] for this version)

Submission history

From: Sung-Feng Huang [view email]
[v1] Thu, 29 Oct 2020 06:07:01 GMT (1105kb,D)
[v2] Tue, 8 Jun 2021 15:31:15 GMT (721kb,D)
[v3] Sun, 22 Aug 2021 06:26:39 GMT (721kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.15366v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Submission history