Globetrotter: Connecting Languages by Connecting Images

Surís, Dídac; Epstein, Dave; Vondrick, Carl

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2012

Computer Science > Computation and Language

Title: Globetrotter: Connecting Languages by Connecting Images

Authors: Dídac Surís, Dave Epstein, Carl Vondrick

(Submitted on 8 Dec 2020 (v1), last revised 1 Apr 2022 (this version, v4))

Abstract: Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain. Our key insight is that, while languages may vary drastically, the underlying visual appearance of the world remains consistent. We introduce a method that uses visual observations to bridge the gap between languages, rather than relying on parallel corpora or topological properties of the representations. We train a model that aligns segments of text from different languages if and only if the images associated with them are similar and each image in turn is well-aligned with its textual description. We train our model from scratch on a new dataset of text in over fifty languages with accompanying images. Experiments show that our method outperforms previous work on unsupervised word and sentence translation using retrieval. Code, models and data are available on globetrotter.cs.columbia.edu.

Comments:	CVPR 2022 (Oral)
Subjects:	Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2012.04631 [cs.CL]
	(or arXiv:2012.04631v4 [cs.CL] for this version)

Submission history

From: Didac Surís Coll-Vinent [view email]
[v1] Tue, 8 Dec 2020 18:50:40 GMT (30452kb,D)
[v2] Thu, 17 Mar 2022 22:37:07 GMT (30677kb,D)
[v3] Sun, 27 Mar 2022 20:19:44 GMT (30452kb,D)
[v4] Fri, 1 Apr 2022 03:41:40 GMT (30442kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2012.04631

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Globetrotter: Connecting Languages by Connecting Images

Submission history