We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers

Abstract: The computational benefits of iterative non-autoregressive transformers decrease as the number of decoding steps increases. As a remedy, we introduce Distill Multiple Steps (DiMS), a simple yet effective distillation technique to decrease the number of required steps to reach a certain translation quality. The distilled model enjoys the computational benefits of early iterations while preserving the enhancements from several iterative steps. DiMS relies on two models namely student and teacher. The student is optimized to predict the output of the teacher after multiple decoding steps while the teacher follows the student via a slow-moving average. The moving average keeps the teacher's knowledge updated and enhances the quality of the labels provided by the teacher. During inference, the student is used for translation and no additional computation is added. We verify the effectiveness of DiMS on various models obtaining improvements of up to 7 BLEU points on distilled and 12 BLEU points on raw WMT datasets for single-step translation. We release our code at this https URL
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2206.02999 [cs.CL]
  (or arXiv:2206.02999v1 [cs.CL] for this version)

Submission history

From: Sajad Norouzi [view email]
[v1] Tue, 7 Jun 2022 04:25:41 GMT (463kb,D)

Link back to: arXiv, form interface, contact.