Current browse context:
cs.SD
Change to browse by:
References & Citations
Computer Science > Sound
Title: Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation
(Submitted on 29 Oct 2020 (this version), latest version 22 Aug 2021 (v3))
Abstract: Speech separation has been well-developed while there are still problems waiting to be solved. The main problem we focus on in this paper is the frequent label permutation switching of permutation invariant training (PIT). For N-speaker separation, there would be N! possible label permutations. How to stably select correct label permutations is a long-standing problem. In this paper, we utilize self-supervised pre-training to stabilize the label permutations. Among several types of self-supervised tasks, speech enhancement based pre-training tasks show significant effectiveness in our experiments. When using off-the-shelf pre-trained models, training duration could be shortened to one-third to two-thirds. Furthermore, even taking pre-training time into account, the entire training process could still be shorter without a performance drop when using a larger batch size.
Submission history
From: Sung-Feng Huang [view email][v1] Thu, 29 Oct 2020 06:07:01 GMT (1105kb,D)
[v2] Tue, 8 Jun 2021 15:31:15 GMT (721kb,D)
[v3] Sun, 22 Aug 2021 06:26:39 GMT (721kb,D)
Link back to: arXiv, form interface, contact.