Test-Time Training on Nearest Neighbors for Large Language Models

Hardt, Moritz; Sun, Yu

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2305

Computer Science > Computation and Language

Title: Test-Time Training on Nearest Neighbors for Large Language Models

Authors: Moritz Hardt, Yu Sun

(Submitted on 29 May 2023 (v1), last revised 2 Feb 2024 (this version, v3))

Abstract: Many recent efforts augment language models with retrieval, by adding retrieved data to the input context. For this approach to succeed, the retrieved data must be added at both training and test time. Moreover, as input length grows linearly with the size of retrieved data, cost in computation and memory grows quadratically for modern Transformers. To avoid these complications, we simply fine-tune the model on retrieved data at test time, using its standard training setup. We build a large-scale distributed index based on text embeddings of the Pile dataset. For each test input, our system retrieves its neighbors and fine-tunes the model on their text. Surprisingly, retrieving and training on as few as 20 neighbors, each for only one gradient iteration, drastically improves performance across more than 20 language modeling tasks in the Pile. For example, test-time training with nearest neighbors significantly narrows the performance gap between a small GPT-2 and a GPT-Neo model more than 10 times larger. Sufficient index quality and size, however, are necessary. Our work establishes a first baseline of test-time training for language modeling.

Comments:	ICLR final version
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2305.18466 [cs.CL]
	(or arXiv:2305.18466v3 [cs.CL] for this version)

Submission history

From: Yu Sun [view email]
[v1] Mon, 29 May 2023 08:03:28 GMT (239kb,D)
[v2] Wed, 7 Jun 2023 06:21:30 GMT (286kb,D)
[v3] Fri, 2 Feb 2024 20:28:27 GMT (309kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.18466

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Test-Time Training on Nearest Neighbors for Large Language Models

Submission history