We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: On the Copying Behaviors of Pre-Training for Neural Machine Translation

Abstract: Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at this https URL
Comments: Accepted to Findings of ACL 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2107.08212 [cs.CL]
  (or arXiv:2107.08212v1 [cs.CL] for this version)

Submission history

From: Xuebo Liu [view email]
[v1] Sat, 17 Jul 2021 10:02:30 GMT (1152kb,D)

Link back to: arXiv, form interface, contact.