References & Citations
Computer Science > Computation and Language
Title: Very Deep Transformers for Neural Machine Translation
(Submitted on 18 Aug 2020 (v1), last revised 14 Oct 2020 (this version, v2))
Abstract: We explore the application of very deep Transformer models for Neural Machine Translation (NMT). Using a simple yet effective initialization technique that stabilizes training, we show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers. These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU, and achieve new state-of-the-art benchmark results on WMT14 English-French (43.8 BLEU and 46.4 BLEU with back-translation) and WMT14 English-German (30.1 BLEU).The code and trained models will be publicly available at: this https URL
Submission history
From: Xiaodong Liu [view email][v1] Tue, 18 Aug 2020 07:14:54 GMT (600kb,D)
[v2] Wed, 14 Oct 2020 22:56:32 GMT (607kb,D)
Link back to: arXiv, form interface, contact.