Improved Large-margin Softmax Loss for Speaker Diarisation

Fathullah, Yassir; Zhang, Chao; Woodland, Philip C.

doi:10.1109/ICASSP40776.2020.9053373

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1911

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Improved Large-margin Softmax Loss for Speaker Diarisation

Authors: Yassir Fathullah, Chao Zhang, Philip C. Woodland

(Submitted on 10 Nov 2019 (v1), last revised 6 Jul 2020 (this version, v3))

Abstract: Speaker diarisation systems nowadays use embeddings generated from speech segments in a bottleneck layer, which are needed to be discriminative for unseen speakers. It is well-known that large-margin training can improve the generalisation ability to unseen data, and its use in such open-set problems has been widespread. Therefore, this paper introduces a general approach to the large-margin softmax loss without any approximations to improve the quality of speaker embeddings for diarisation. Furthermore, a novel and simple way to stabilise training, when large-margin softmax is used, is proposed. Finally, to combat the effect of overlapping speech, different training margins are used to reduce the negative effect overlapping speech has on creating discriminative embeddings. Experiments on the AMI meeting corpus show that the use of large-margin softmax significantly improves the speaker error rate (SER). By using all hyper parameters of the loss in a unified way, further improvements were achieved which reached a relative SER reduction of 24.6% over the baseline. However, by training overlapping and single speaker speech samples with different margins, the best result was achieved, giving overall a 29.5% SER reduction relative to the baseline.

Comments:	ICASSP 2020
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Journal reference:	ICASSP 2020, Barcelona, Spain, 2020, pp. 7104-7108
DOI:	10.1109/ICASSP40776.2020.9053373
Cite as:	arXiv:1911.03970 [eess.AS]
	(or arXiv:1911.03970v3 [eess.AS] for this version)

Submission history

From: Yassir Fathullah [view email]
[v1] Sun, 10 Nov 2019 17:41:11 GMT (1139kb,D)
[v2] Fri, 14 Feb 2020 17:34:53 GMT (1139kb,D)
[v3] Mon, 6 Jul 2020 09:32:49 GMT (1017kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1911.03970

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Improved Large-margin Softmax Loss for Speaker Diarisation

Submission history