Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

Yang, Ziye; Guan, Shanzheng; Zhang, Xiao-Lei

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2012

Computer Science > Sound

Title: Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

Authors: Ziye Yang, Shanzheng Guan, Xiao-Lei Zhang

(Submitted on 1 Dec 2020)

Abstract: Recently, the research on ad-hoc microphone arrays with deep learning has drawn much attention, especially in speech enhancement and separation. Because an ad-hoc microphone array may cover such a large area that multiple speakers may locate far apart and talk independently, target-dependent speech separation, which aims to extract a target speaker from a mixed speech, is important for extracting and tracing a specific speaker in the ad-hoc array. However, this technique has not been explored yet. In this paper, we propose deep ad-hoc beamforming based on speaker extraction, which is to our knowledge the first work for target-dependent speech separation based on ad-hoc microphone arrays and deep learning. The algorithm contains three components. First, we propose a supervised channel selection framework based on speaker extraction, where the estimated utterance-level SNRs of the target speech are used as the basis for the channel selection. Second, we apply the selected channels to a deep learning based MVDR algorithm, where a single-channel speaker extraction algorithm is applied to each selected channel for estimating the mask of the target speech. We conducted an extensive experiment on a WSJ0-adhoc corpus. Experimental results demonstrate the effectiveness of the proposed method.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2012.00403 [cs.SD]
	(or arXiv:2012.00403v1 [cs.SD] for this version)

Submission history

From: Ziye Yang [view email]
[v1] Tue, 1 Dec 2020 11:06:36 GMT (467kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2012.00403

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

Submission history