We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Latent Variable Model for Multi-modal Translation

Abstract: In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model. This latent variable can be seen as a multi-modal stochastic embedding of an image and its description in a foreign language. It is used in a target-language decoder and also to predict image features. Importantly, our model formulation utilises visual and textual inputs during training but does not require that images be available at test time. We show that our latent variable MMT formulation improves considerably over strong baselines, including a multi-task learning approach (Elliott and K\'ad\'ar, 2017) and a conditional variational auto-encoder approach (Toyama et al., 2016). Finally, we show improvements due to (i) predicting image features in addition to only conditioning on them, (ii) imposing a constraint on the minimum amount of information encoded in the latent variable, and (iii) by training on additional target-language image descriptions (i.e. synthetic data).
Comments: Paper accepted at ACL 2019. Contains 8 pages (11 including references, 13 including appendix), 6 figures
Subjects: Computation and Language (cs.CL)
ACM classes: I.2.7
Cite as: arXiv:1811.00357 [cs.CL]
  (or arXiv:1811.00357v2 [cs.CL] for this version)

Submission history

From: Iacer Calixto [view email]
[v1] Thu, 1 Nov 2018 13:19:27 GMT (104kb)
[v2] Thu, 16 May 2019 16:56:25 GMT (269kb,D)

Link back to: arXiv, form interface, contact.