Speaker activity driven neural speech extraction

Delcroix, Marc; Zmolikova, Katerina; Ochiai, Tsubasa; Kinoshita, Keisuke; Nakatani, Tomohiro

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2101

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Speaker activity driven neural speech extraction

Authors: Marc Delcroix, Katerina Zmolikova, Tsubasa Ochiai, Keisuke Kinoshita, Tomohiro Nakatani

(Submitted on 14 Jan 2021 (v1), last revised 9 Feb 2021 (this version, v2))

Abstract: Target speech extraction, which extracts the speech of a target speaker in a mixture given auxiliary speaker clues, has recently received increased interest. Various clues have been investigated such as pre-recorded enrollment utterances, direction information, or video of the target speaker. In this paper, we explore the use of speaker activity information as an auxiliary clue for single-channel neural network-based speech extraction. We propose a speaker activity driven speech extraction neural network (ADEnet) and show that it can achieve performance levels competitive with enrollment-based approaches, without the need for pre-recordings. We further demonstrate the potential of the proposed approach for processing meeting-like recordings, where the speaker activity is obtained from a diarization system. We show that this simple yet practical approach can successfully extract speakers after diarization, which results in improved ASR performance, especially in high overlapping conditions, with a relative word error rate reduction of up to 25%.

Comments:	To appear in ICASSP 2021
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2101.05516 [eess.AS]
	(or arXiv:2101.05516v2 [eess.AS] for this version)

Submission history

From: Marc Delcroix [view email]
[v1] Thu, 14 Jan 2021 09:21:51 GMT (2447kb,D)
[v2] Tue, 9 Feb 2021 23:33:59 GMT (2447kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2101.05516

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Speaker activity driven neural speech extraction

Submission history