We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

Abstract: In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enrolment encoders are based on two standard audio architectures: CNN14 and CNN10. The two encoders are further guided to forget or learn auxiliary emotion and/or speaker information. Our best approach achieves a CCC of $.650$ on the ExVo Few-Shot dev set, a $2.5\%$ increase over our baseline CNN14 CCC of $.634$.
Comments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2206.06680 [cs.SD]
  (or arXiv:2206.06680v2 [cs.SD] for this version)

Submission history

From: Andreas Triantafyllopoulos [view email]
[v1] Tue, 14 Jun 2022 08:15:16 GMT (113kb,D)
[v2] Mon, 20 Jun 2022 15:29:57 GMT (113kb,D)

Link back to: arXiv, form interface, contact.