We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Synthetic Source Language Augmentation for Colloquial Neural Machine Translation

Abstract: Neural machine translation (NMT) is typically domain-dependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models. In this work, we develop a novel colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source of formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
MSC classes: 68T50
ACM classes: I.2.7; I.2.6
Cite as: arXiv:2012.15178 [cs.CL]
  (or arXiv:2012.15178v1 [cs.CL] for this version)

Submission history

From: Asrul Sani Ariesandy [view email]
[v1] Wed, 30 Dec 2020 14:52:15 GMT (30kb,D)

Link back to: arXiv, form interface, contact.