We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation

Abstract: We demonstrate a new approach to Neural Machine Translation (NMT) for low-resource languages using a ubiquitous linguistic resource, Interlinear Glossed Text (IGT). IGT represents a non-English sentence as a sequence of English lemmas and morpheme labels. As such, it can serve as a pivot or interlingua for NMT. Our contribution is four-fold. Firstly, we pool IGT for 1,497 languages in ODIN (54,545 glosses) and 70,918 glosses in Arapaho and train a gloss-to-target NMT system from IGT to English, with a BLEU score of 25.94. We introduce a multilingual NMT model that tags all glossed text with gloss-source language tags and train a universal system with shared attention across 1,497 languages. Secondly, we use the IGT gloss-to-target translation as a key step in an English-Turkish MT system trained on only 865 lines from ODIN. Thirdly, we we present five metrics for evaluating extremely low-resource translation when BLEU is no longer sufficient and evaluate the Turkish low-resource system using BLEU and also using accuracy of matching nouns, verbs, agreement, tense, and spurious repetition, showing large improvements.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:1911.02709 [cs.CL]
  (or arXiv:1911.02709v3 [cs.CL] for this version)

Submission history

From: Zhong Zhou [view email]
[v1] Thu, 7 Nov 2019 01:45:33 GMT (33kb)
[v2] Sun, 16 Feb 2020 05:51:32 GMT (207kb,D)
[v3] Tue, 3 Mar 2020 14:57:40 GMT (207kb,D)

Link back to: arXiv, form interface, contact.