We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: StreaMulT: Streaming Multimodal Transformer for Heterogeneous and Arbitrary Long Sequential Data

Authors: Victor Pellegrain (1 and 2), Myriam Tami (2), Michel Batteux (1), Céline Hudelot (2) ((1) Institut de Recherche Technologique SystemX, (2) Université Paris-Saclay, CentraleSupélec, MICS)
Abstract: This paper tackles the problem of processing and combining efficiently arbitrary long data streams, coming from different modalities with different acquisition frequencies. Common applications can be, for instance, long-time industrial or real-life systems monitoring from multimodal heterogeneous data (sensor data, monitoring report, images, etc.). To tackle this problem, we propose StreaMulT, a Streaming Multimodal Transformer, relying on cross-modal attention and an augmented memory bank to process arbitrary long input sequences at training time and run in a streaming way at inference. StreaMulT reproduces state-of-the-art results on CMU-MOSEI dataset, while being able to deal with much longer inputs than other models such as previous Multimodal Transformer.
Comments: 5 pages, 4 figures, submitted to ICASSP 2022
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Multimedia (cs.MM)
Cite as: arXiv:2110.08021 [cs.LG]
  (or arXiv:2110.08021v1 [cs.LG] for this version)

Submission history

From: Victor Pellegrain [view email]
[v1] Fri, 15 Oct 2021 11:32:17 GMT (2590kb,D)

Link back to: arXiv, form interface, contact.