Continuous Speech Separation with Ad Hoc Microphone Arrays

Wang, Dongmei; Yoshioka, Takuya; Chen, Zhuo; Wang, Xiaofei; Zhou, Tianyan; Meng, Zhong

Full-text links:

Download:

Current browse context:

eess

< prev | next >

new | recent | 2103

Computer Science > Sound

Title: Continuous Speech Separation with Ad Hoc Microphone Arrays

Authors: Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng

(Submitted on 3 Mar 2021)

Abstract: Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry and number of microphones are unknown beforehand. Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array. In this paper, we further extend this approach to continuous speech separation. Several techniques are introduced to enable speech separation for real continuous recordings. First, we apply a transformer-based network for spatio-temporal modeling of the ad hoc array signals. In addition, two methods are proposed to mitigate a speech duplication problem during single talker segments, which seems more severe in the ad hoc array scenarios. One method is device distortion simulation for reducing the acoustic mismatch between simulated training data and real recordings. The other is speaker counting to detect the single speaker segments and merge the output signal channels. Experimental results for AdHoc-LibiCSS, a new dataset consisting of continuous recordings of concatenated LibriSpeech utterances obtained by multiple different devices, show the proposed separation method can significantly improve the ASR accuracy for overlapped speech with little performance degradation for single talker segments.

Subjects:	Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2103.02378 [cs.SD]
	(or arXiv:2103.02378v1 [cs.SD] for this version)

Submission history

From: Dongmei Wang [view email]
[v1] Wed, 3 Mar 2021 13:01:08 GMT (135kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2103.02378

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Sound

Title: Continuous Speech Separation with Ad Hoc Microphone Arrays

Submission history