We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Muppet: Massive Multi-task Representations with Pre-Finetuning

Abstract: We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2101.11038 [cs.CL]
  (or arXiv:2101.11038v1 [cs.CL] for this version)

Submission history

From: Armen Aghajanyan [view email]
[v1] Tue, 26 Jan 2021 19:18:27 GMT (7151kb,D)

Link back to: arXiv, form interface, contact.