Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Triantafyllopoulos, Andreas; Song, Meishu; Yang, Zijiang; Jing, Xin; Schuller, Björn W.

Full-text links:

Download:

Computer Science > Sound

Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Authors: Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin Jing, Björn W. Schuller

(Submitted on 14 Jun 2022 (v1), last revised 20 Jun 2022 (this version, v2))

Abstract: In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enrolment encoders are based on two standard audio architectures: CNN14 and CNN10. The two encoders are further guided to forget or learn auxiliary emotion and/or speaker information. Our best approach achieves a CCC of $.650$ on the ExVo Few-Shot dev set, a $2.5\%$ increase over our baseline CNN14 CCC of $.634$.

Comments:	Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2206.06680 [cs.SD]
	(or arXiv:2206.06680v2 [cs.SD] for this version)

Submission history

From: Andreas Triantafyllopoulos [view email]
[v1] Tue, 14 Jun 2022 08:15:16 GMT (113kb,D)
[v2] Mon, 20 Jun 2022 15:29:57 GMT (113kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.06680

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Submission history