Fusion approaches for emotion recognition from speech using acoustic and text-based features

Pepino, Leonardo; Riera, Pablo; Ferrer, Luciana; Gravano, Agustin

doi:10.1109/ICASSP40776.2020.9054709

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2403

Computer Science > Machine Learning

Title: Fusion approaches for emotion recognition from speech using acoustic and text-based features

Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer, Agustin Gravano

(Submitted on 27 Mar 2024)

Abstract: In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSP-PODCAST datasets. We find that fusing acoustic and text-based systems is beneficial on both datasets, though only subtle differences are observed across the evaluated fusion approaches. Finally, for IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results. In particular, the standard way of creating folds for this dataset results in a highly optimistic estimation of performance for the text-based system, suggesting that some previous works may overestimate the advantage of incorporating transcriptions.

Comments:	5 pages. Accepted in ICASSP 2020
Subjects:	Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
DOI:	10.1109/ICASSP40776.2020.9054709
Cite as:	arXiv:2403.18635 [cs.LG]
	(or arXiv:2403.18635v1 [cs.LG] for this version)

Submission history

From: Leonardo Pepino [view email]
[v1] Wed, 27 Mar 2024 14:40:25 GMT (467kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.18635

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Fusion approaches for emotion recognition from speech using acoustic and text-based features

Submission history