We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Very Deep Transformers for Neural Machine Translation

Abstract: We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: this https URL
Comments: 6 pages, 3 figures and 4 tables. V2 includes the back-translation results
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2008.07772 [cs.CL]
  (or arXiv:2008.07772v2 [cs.CL] for this version)

Submission history

From: Xiaodong Liu [view email]
[v1] Tue, 18 Aug 2020 07:14:54 GMT (600kb,D)
[v2] Wed, 14 Oct 2020 22:56:32 GMT (607kb,D)

Link back to: arXiv, form interface, contact.