EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Chen, Xiaohan; Cheng, Yu; Wang, Shuohang; Gan, Zhe; Wang, Zhangyang; Liu, Jingjing

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2101

Computer Science > Computation and Language

Title: EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Authors: Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu

(Submitted on 31 Dec 2020 (this version), latest version 7 Jun 2021 (v2))

Abstract: Deep, heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks. However, their high model complexity requires enormous computation resources and extremely long training time for both pre-training and fine-tuning. Many works have studied model compression on large NLP models, but only focus on reducing inference cost/time, while still requiring expensive training process. Other works use extremely large batch sizes to shorten the pre-training time at the expense of high demand for computation resources. In this paper, inspired by the Early-Bird Lottery Tickets studied for computer vision tasks, we propose EarlyBERT, a general computationally-efficient training algorithm applicable to both pre-training and fine-tuning of large-scale language models. We are the first to identify structured winning tickets in the early stage of BERT training, and use them for efficient training. Comprehensive pre-training and fine-tuning experiments on GLUE and SQuAD downstream tasks show that EarlyBERT easily achieves comparable performance to standard BERT with 35~45% less training time.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2101.00063 [cs.CL]
	(or arXiv:2101.00063v1 [cs.CL] for this version)

Submission history

From: Xiaohan Chen [view email]
[v1] Thu, 31 Dec 2020 20:38:20 GMT (2174kb,D)
[v2] Mon, 7 Jun 2021 18:26:28 GMT (10177kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.00063v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets

Submission history