References & Citations
Computer Science > Computation and Language
Title: Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation
(Submitted on 7 Nov 2019 (v1), last revised 3 Mar 2020 (this version, v3))
Abstract: We demonstrate a new approach to Neural Machine Translation (NMT) for low-resource languages using a ubiquitous linguistic resource, Interlinear Glossed Text (IGT). IGT represents a non-English sentence as a sequence of English lemmas and morpheme labels. As such, it can serve as a pivot or interlingua for NMT. Our contribution is four-fold. Firstly, we pool IGT for 1,497 languages in ODIN (54,545 glosses) and 70,918 glosses in Arapaho and train a gloss-to-target NMT system from IGT to English, with a BLEU score of 25.94. We introduce a multilingual NMT model that tags all glossed text with gloss-source language tags and train a universal system with shared attention across 1,497 languages. Secondly, we use the IGT gloss-to-target translation as a key step in an English-Turkish MT system trained on only 865 lines from ODIN. Thirdly, we we present five metrics for evaluating extremely low-resource translation when BLEU is no longer sufficient and evaluate the Turkish low-resource system using BLEU and also using accuracy of matching nouns, verbs, agreement, tense, and spurious repetition, showing large improvements.
Submission history
From: Zhong Zhou [view email][v1] Thu, 7 Nov 2019 01:45:33 GMT (33kb)
[v2] Sun, 16 Feb 2020 05:51:32 GMT (207kb,D)
[v3] Tue, 3 Mar 2020 14:57:40 GMT (207kb,D)
Link back to: arXiv, form interface, contact.