Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Ding, Liang; Wang, Longyue; Liu, Xuebo; Wong, Derek F.; Tao, Dacheng; Tu, Zhaopeng

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2012

Change to browse by:

Computer Science > Computation and Language

Title: Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Authors: Liang Ding, Longyue Wang, Xuebo Liu, Derek F. Wong, Dacheng Tao, Zhaopeng Tu

(Submitted on 29 Dec 2020 (v1), last revised 27 Jan 2021 (this version, v2))

Abstract: Knowledge distillation (KD) is essential for training non-autoregressive translation (NAT) models by reducing the complexity of the raw data with an autoregressive teacher model. In this study, we empirically show that as a side effect of this training, the lexical choice errors on low-frequency words are propagated to the NAT model from the teacher model. To alleviate this problem, we propose to expose the raw data to NAT models to restore the useful information of low-frequency words, which are missed in the distilled data. To this end, we introduce an extra Kullback-Leibler divergence term derived by comparing the lexical choice of NAT model and that embedded in the raw data. Experimental results across language pairs and model architectures demonstrate the effectiveness and universality of the proposed approach. Extensive analyses confirm our claim that our approach improves performance by reducing the lexical choice errors on low-frequency words. Encouragingly, our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively. The source code will be released.

Comments:	ICLR 2021
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2012.14583 [cs.CL]
	(or arXiv:2012.14583v2 [cs.CL] for this version)

Submission history

From: Liang Ding [view email]
[v1] Tue, 29 Dec 2020 03:18:50 GMT (1834kb,D)
[v2] Wed, 27 Jan 2021 07:22:16 GMT (1834kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2012.14583

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Submission history