Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

Muppidi, Aneesh; Radfar, Martin

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2111

Computer Science > Sound

Title: Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

Authors: Aneesh Muppidi, Martin Radfar

(Submitted on 31 Oct 2021)

Abstract: Although speech recognition has become a widespread technology, inferring emotion from speech signals still remains a challenge. To address this problem, this paper proposes a quaternion convolutional neural network (QCNN) based speech emotion recognition (SER) model in which Mel-spectrogram features of speech signals are encoded in an RGB quaternion domain. We show that our QCNN based SER model outperforms other real-valued methods in the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS, 8-classes) dataset, achieving, to the best of our knowledge, state-of-the-art results. The QCNN also achieves comparable results with the state-of-the-art methods in the Interactive Emotional Dyadic Motion Capture (IEMOCAP 4-classes) and Berlin EMO-DB (7-classes) datasets. Specifically, the model achieves an accuracy of 77.87\%, 70.46\%, and 88.78\% for the RAVDESS, IEMOCAP, and EMO-DB datasets, respectively. In addition, our results show that the quaternion unit structure is better able to encode internal dependencies to reduce its model size significantly compared to other methods.

Comments:	Published in ICASSP 2021
Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2111.00404 [cs.SD]
	(or arXiv:2111.00404v1 [cs.SD] for this version)

Submission history

From: Martin Radfar [view email]
[v1] Sun, 31 Oct 2021 04:06:07 GMT (2513kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2111.00404

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

Submission history