We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Should All Cross-Lingual Embeddings Speak English?

Abstract: Most of recent work in cross-lingual word embeddings is severely Anglocentric. The vast majority of lexicon induction evaluation dictionaries are between English and another language, and the English embedding space is selected by default as the hub when learning in a multilingual setting. With this work, however, we challenge these practices. First, we show that the choice of hub language can significantly impact downstream lexicon induction performance. Second, we both expand the current evaluation dictionary collection to include all language pairs using triangulation, and also create new dictionaries for under-represented languages. Evaluating established methods over all these language pairs sheds light into their suitability and presents new challenges for the field. Finally, in our analysis we identify general guidelines for strong cross-lingual embeddings baselines, based on more than just Anglocentric experiments.
Comments: accepted for publication at ACL 2020
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:1911.03058 [cs.CL]
  (or arXiv:1911.03058v2 [cs.CL] for this version)

Submission history

From: Antonios Anastasopoulos [view email]
[v1] Fri, 8 Nov 2019 05:29:57 GMT (89kb)
[v2] Sun, 5 Apr 2020 03:22:36 GMT (101kb)

Link back to: arXiv, form interface, contact.