MIRNet: Learning multiple identities representations in overlapped speech

Han, Hyewon; Chung, Soo-Whan; Kang, Hong-Goo

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2008

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: MIRNet: Learning multiple identities representations in overlapped speech

Authors: Hyewon Han, Soo-Whan Chung, Hong-Goo Kang

(Submitted on 4 Aug 2020 (v1), last revised 6 Aug 2020 (this version, v2))

Abstract: Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are multiple concurrent speakers in a given signal. In this paper, we propose a novel deep speaker representation strategy that can reliably extract multiple speaker identities from an overlapped speech. We design a network that can extract a high-level embedding that contains information about each speaker's identity from a given mixture. Unlike conventional approaches that need reference acoustic features for training, our proposed algorithm only requires the speaker identity labels of the overlapped speech segments. We demonstrate the effectiveness and usefulness of our algorithm in a speaker verification task and a speech separation system conditioned on the target speaker embeddings obtained through the proposed method.

Comments:	Accepted in Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2008.01698 [eess.AS]
	(or arXiv:2008.01698v2 [eess.AS] for this version)

Submission history

From: Hyewon Han [view email]
[v1] Tue, 4 Aug 2020 16:55:14 GMT (876kb,D)
[v2] Thu, 6 Aug 2020 16:44:34 GMT (876kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2008.01698

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: MIRNet: Learning multiple identities representations in overlapped speech

Submission history