Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Karra, Kiran; McCree, Alan

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2104

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Authors: Kiran Karra, Alan McCree

(Submitted on 6 Apr 2021 (v1), last revised 14 Jun 2021 (this version, v3))

Abstract: Many modern systems for speaker diarization, such as the recently-developed VBx approach, rely on clustering of DNN speaker embeddings followed by resegmentation. Two problems with this approach are that the DNN is not directly optimized for this task, and the parameters need significant retuning for different applications. We have recently presented progress in this direction with a Leave-One-Out Gaussian PLDA (LGP) clustering algorithm and an approach to training the DNN such that embeddings directly optimize performance of this scoring method. This paper presents a new two-pass version of this system, where the second pass uses finer time resolution to significantly improve overall performance. For the Callhome corpus, we achieve the first published error rate below 4% without any task-dependent parameter tuning. We also show significant progress towards a robust single solution for multiple diarization tasks.

Comments:	5 pages, 2 figures, accepted at INTERSPEECH 2021
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Signal Processing (eess.SP)
Cite as:	arXiv:2104.02469 [eess.AS]
	(or arXiv:2104.02469v3 [eess.AS] for this version)

Submission history

From: Kiran Karra [view email]
[v1] Tue, 6 Apr 2021 12:52:55 GMT (749kb,D)
[v2] Wed, 7 Apr 2021 01:39:17 GMT (748kb,D)
[v3] Mon, 14 Jun 2021 22:23:12 GMT (167kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2104.02469

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN Embeddings

Submission history