We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Enhanced back-translation for low resource neural machine translation using self-training

Abstract: Improving neural machine translation (NMT) models using the back-translations of the monolingual target data (synthetic parallel data) is currently the state-of-the-art approach for training improved translation systems. The quality of the backward system - which is trained on the available parallel data and used for the back-translation - has been shown in many studies to affect the performance of the final NMT model. In low resource conditions, the available parallel data is usually not enough to train a backward model that can produce the qualitative synthetic data needed to train a standard translation model. This work proposes a self-training strategy where the output of the backward model is used to improve the model itself through the forward translation technique. The technique was shown to improve baseline low resource IWSLT'14 English-German and IWSLT'15 English-Vietnamese backward translation models by 11.06 and 1.5 BLEUs respectively. The synthetic data generated by the improved English-German backward model was used to train a forward model which out-performed another forward model trained using standard back-translation by 2.7 BLEU.
Comments: 17 pages, 3 figures, 5 tables; Accepted for publication in the International Conference on Information and Communication Technology and Applications (ICTA 2020)
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
DOI: 10.1007/978-3-030-69143-1_28
Cite as: arXiv:2006.02876 [cs.CL]
  (or arXiv:2006.02876v3 [cs.CL] for this version)

Submission history

From: Idris Abdulmumin [view email]
[v1] Thu, 4 Jun 2020 14:19:52 GMT (139kb)
[v2] Mon, 16 Nov 2020 18:35:33 GMT (741kb,D)
[v3] Thu, 24 Dec 2020 10:35:31 GMT (606kb,D)

Link back to: arXiv, form interface, contact.