We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Many-to-English Machine Translation Tools, Data, and Pretrained Models

Abstract: While there are more than 7000 languages in the world, most translation research efforts have targeted a few high-resource languages. Commercial translation systems support only one hundred languages or fewer, and do not make these models available for transfer to low resource languages. In this work, we present useful tools for machine translation research: MTData, NLCodec, and RTG. We demonstrate their usefulness by creating a multilingual neural machine translation model capable of translating from 500 source languages to English. We make this multilingual model readily downloadable and usable as a service, or as a parent model for transfer-learning to even lower-resource languages.
Comments: To-appear: ACL 2021 System Demonstrations
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2104.00290 [cs.CL]
  (or arXiv:2104.00290v2 [cs.CL] for this version)

Submission history

From: Thamme Gowda [view email]
[v1] Thu, 1 Apr 2021 06:55:12 GMT (309kb,D)
[v2] Thu, 1 Jul 2021 19:40:00 GMT (630kb,D)

Link back to: arXiv, form interface, contact.