Current browse context:
cs.CL
Change to browse by:
References & Citations
Computer Science > Computation and Language
Title: Pre-Finetuning for Few-Shot Emotional Speech Recognition
(Submitted on 24 Feb 2023 (v1), last revised 28 Feb 2023 (this version, v2))
Abstract: Speech models have long been known to overfit individual speakers for many classification tasks. This leads to poor generalization in settings where the speakers are out-of-domain or out-of-distribution, as is common in production environments. We view speaker adaptation as a few-shot learning problem and propose investigating transfer learning approaches inspired by recent success with pre-trained models in natural language tasks. We propose pre-finetuning speech models on difficult tasks to distill knowledge into few-shot downstream classification objectives. We pre-finetune Wav2Vec2.0 on every permutation of four multiclass emotional speech recognition corpora and evaluate our pre-finetuned models through 33,600 few-shot fine-tuning trials on the Emotional Speech Dataset.
Submission history
From: Maximillian Chen [view email][v1] Fri, 24 Feb 2023 22:38:54 GMT (108kb,D)
[v2] Tue, 28 Feb 2023 02:28:41 GMT (107kb,D)
Link back to: arXiv, form interface, contact.