Current browse context:
cs.SD
Change to browse by:
References & Citations
Computer Science > Sound
Title: Cross-modal Music Emotion Recognition Using Composite Loss-based Embeddings
(Submitted on 14 Dec 2021 (v1), revised 29 Jul 2022 (this version, v2), latest version 8 Apr 2023 (v5))
Abstract: Most music emotion recognition approaches use one-way classification or regression that estimates a general emotion from a distribution of music samples, but without considering emotional variations (e.g., happiness can be further categorised into much, moderate or little happiness). We propose a cross-modal music emotion recognition approach that associates music samples with emotions in a common space by considering both of their general and specific characteristics. Since the association of music samples with emotions is uncertain due to subjective human perceptions, we compute composite loss-based embeddings obtained to maximise two statistical characteristics, one being the correlation between music samples and emotions based on canonical correlation analysis, and the other being a probabilistic similarity between a music sample and an emotion with KL-divergence. Experiments on two benchmark datasets demonstrate the superiority of our approach over one-way baselines. In addition, detailed analysis show that our approach can accomplish robust cross-modal music emotion recognition that not only identifies music samples matching with a specific emotion but also detects emotions expressed in a certain music sample.
Submission history
From: Naoki Takashima [view email][v1] Tue, 14 Dec 2021 06:54:08 GMT (1942kb,D)
[v2] Fri, 29 Jul 2022 16:33:38 GMT (2703kb,D)
[v3] Mon, 5 Sep 2022 06:42:52 GMT (6450kb,D)
[v4] Wed, 8 Feb 2023 15:43:56 GMT (4930kb,D)
[v5] Sat, 8 Apr 2023 06:26:50 GMT (4995kb,D)
Link back to: arXiv, form interface, contact.