Continuous Speech Separation with Recurrent Selective Attention Network

Zhang, Yixuan; Chen, Zhuo; Wu, Jian; Yoshioka, Takuya; Wang, Peidong; Meng, Zhong; Li, Jinyu

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Continuous Speech Separation with Recurrent Selective Attention Network

Authors: Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

(Submitted on 28 Oct 2021)

Abstract: While permutation invariant training (PIT) based continuous speech separation (CSS) significantly improves the conversation transcription accuracy, it often suffers from speech leakages and failures in separation at "hot spot" regions because it has a fixed number of output channels. In this paper, we propose to apply recurrent selective attention network (RSAN) to CSS, which generates a variable number of output channels based on active speaker counting. In addition, we propose a novel block-wise dependency extension of RSAN by introducing dependencies between adjacent processing blocks in the CSS framework. It enables the network to utilize the separation results from the previous blocks to facilitate the current block processing. Experimental results on the LibriCSS dataset show that the RSAN-based CSS (RSAN-CSS) network consistently improves the speech recognition accuracy over PIT-based models. The proposed block-wise dependency modeling further boosts the performance of RSAN-CSS.

Comments:	Submitted to ICASSP 2022
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2110.14838 [eess.AS]
	(or arXiv:2110.14838v1 [eess.AS] for this version)

Submission history

From: Yixuan Zhang [view email]
[v1] Thu, 28 Oct 2021 01:34:33 GMT (1470kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.14838

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Continuous Speech Separation with Recurrent Selective Attention Network

Submission history