Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space

Xie, Zhuyang; Yang, Yan; Wang, Jie; Liu, Xiaorong; Li, Xiaofan

doi:10.1109/TCSVT.2024.3376564

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2404

Change to browse by:

References & Citations

NASA ADS

Bookmark

(what is this?)

Computer Science > Computer Vision and Pattern Recognition

Title: Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space

Authors: Zhuyang Xie, Yan Yang, Jie Wang, Xiaorong Liu, Xiaofan Li

(Submitted on 13 Apr 2024)

Abstract: Multimodal video sentiment analysis aims to integrate multiple modal information to analyze the opinions and attitudes of speakers. Most previous work focuses on exploring the semantic interactions of intra- and inter-modality. However, these works ignore the reliability of multimodality, i.e., modalities tend to contain noise, semantic ambiguity, missing modalities, etc. In addition, previous multimodal approaches treat different modalities equally, largely ignoring their different contributions. Furthermore, existing multimodal sentiment analysis methods directly regress sentiment scores without considering ordinal relationships within sentiment categories, with limited performance. To address the aforementioned problems, we propose a trustworthy multimodal sentiment ordinal network (TMSON) to improve performance in sentiment analysis. Specifically, we first devise a unimodal feature extractor for each modality to obtain modality-specific features. Then, an uncertainty distribution estimation network is customized, which estimates the unimodal uncertainty distributions. Next, Bayesian fusion is performed on the learned unimodal distributions to obtain multimodal distributions for sentiment prediction. Finally, an ordinal-aware sentiment space is constructed, where ordinal regression is used to constrain the multimodal distributions. Our proposed TMSON outperforms baselines on multimodal sentiment analysis tasks, and empirical results demonstrate that TMSON is capable of reducing uncertainty to obtain more robust predictions.

Comments:	14 pages, 9 figures, Accepted by IEEE Transactions on Circuits and Systems for Video Technology
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
DOI:	10.1109/TCSVT.2024.3376564
Cite as:	arXiv:2404.08923 [cs.CV]
	(or arXiv:2404.08923v1 [cs.CV] for this version)

Submission history

From: Zhuyang Xie [view email]
[v1] Sat, 13 Apr 2024 08:15:57 GMT (1401kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.08923

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space

Submission history