Current browse context:
cs
Change to browse by:
References & Citations
Computer Science > Sound
Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
(Submitted on 14 Jun 2022 (v1), last revised 20 Jun 2022 (this version, v2))
Abstract: In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enrolment encoders are based on two standard audio architectures: CNN14 and CNN10. The two encoders are further guided to forget or learn auxiliary emotion and/or speaker information. Our best approach achieves a CCC of $.650$ on the ExVo Few-Shot dev set, a $2.5\%$ increase over our baseline CNN14 CCC of $.634$.
Submission history
From: Andreas Triantafyllopoulos [view email][v1] Tue, 14 Jun 2022 08:15:16 GMT (113kb,D)
[v2] Mon, 20 Jun 2022 15:29:57 GMT (113kb,D)
Link back to: arXiv, form interface, contact.