Muppet: Massive Multi-task Representations with Pre-Finetuning

Aghajanyan, Armen; Gupta, Anchit; Shrivastava, Akshat; Chen, Xilun; Zettlemoyer, Luke; Gupta, Sonal

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2101

Computer Science > Computation and Language

Title: Muppet: Massive Multi-task Representations with Pre-Finetuning

Authors: Armen Aghajanyan, Anchit Gupta, Akshat Shrivastava, Xilun Chen, Luke Zettlemoyer, Sonal Gupta

(Submitted on 26 Jan 2021)

Abstract: We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2101.11038 [cs.CL]
	(or arXiv:2101.11038v1 [cs.CL] for this version)

Submission history

From: Armen Aghajanyan [view email]
[v1] Tue, 26 Jan 2021 19:18:27 GMT (7151kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.11038

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Muppet: Massive Multi-task Representations with Pre-Finetuning

Submission history