Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2205

Computer Science > Machine Learning

Title: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Authors: Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, Colin Raffel

(Submitted on 11 May 2022 (v1), last revised 26 Aug 2022 (this version, v2))

Abstract: Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unseen task without any gradient-based training by feeding a small number of training examples as part of the input. ICL incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. Parameter-efficient fine-tuning (PEFT) (e.g. adapter modules, prompt tuning, sparse update methods, etc.) offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and PEFT and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs. Along the way, we introduce a new PEFT method called (IA)$^3$ that scales activations by learned vectors, attaining stronger performance while only introducing a relatively tiny amount of new parameters. We also propose a simple recipe based on the T0 model called T-Few that can be applied to new tasks without task-specific tuning or modifications. We validate the effectiveness of T-Few on completely unseen tasks by applying it to the RAFT benchmark, attaining super-human performance for the first time and outperforming the state-of-the-art by 6% absolute. All of the code used in our experiments is publicly available.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2205.05638 [cs.LG]
	(or arXiv:2205.05638v2 [cs.LG] for this version)

Submission history

From: Colin Raffel [view email]
[v1] Wed, 11 May 2022 17:10:41 GMT (106kb,D)
[v2] Fri, 26 Aug 2022 16:23:29 GMT (120kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2205.05638

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Submission history