Adaptive Fusion Techniques for Multimodal Data

Sahu, Gaurav; Vechtomova, Olga

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1911

Computer Science > Computation and Language

Title: Adaptive Fusion Techniques for Multimodal Data

Authors: Gaurav Sahu, Olga Vechtomova

(Submitted on 10 Nov 2019 (v1), last revised 26 Jan 2021 (this version, v2))

Abstract: Effective fusion of data from multiple modalities, such as video, speech, and text, is challenging due to the heterogeneous nature of multimodal data. In this paper, we propose adaptive fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide "how" to combine a given set of multimodal features more effectively. We propose two networks: 1) Auto-Fusion, which learns to compress information from different modalities while preserving the context, and 2) GAN-Fusion, which regularizes the learned latent space given context from complementing modalities. A quantitative evaluation on the tasks of multimodal machine translation and emotion recognition suggests that our lightweight, adaptive networks can better model context from other modalities than existing methods, many of which employ massive transformer-based networks.

Comments:	Camera-ready version for EACL 2021
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1911.03821 [cs.CL]
	(or arXiv:1911.03821v2 [cs.CL] for this version)

Submission history

From: Gaurav Sahu [view email]
[v1] Sun, 10 Nov 2019 01:39:46 GMT (258kb,D)
[v2] Tue, 26 Jan 2021 08:08:02 GMT (7592kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1911.03821

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Adaptive Fusion Techniques for Multimodal Data

Submission history