Understanding LLMs Requires More Than Statistical Generalization

Reizinger, Patrik; Ujváry, Szilvia; Mészáros, Anna; Kerekes, Anna; Brendel, Wieland; Huszár, Ferenc

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2405

Statistics > Machine Learning

Title: Understanding LLMs Requires More Than Statistical Generalization

Authors: Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár

(Submitted on 3 May 2024)

Abstract: The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

Comments:	Accepted at ICML2024
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2405.01964 [stat.ML]
	(or arXiv:2405.01964v1 [stat.ML] for this version)

Submission history

From: Patrik Reizinger [view email]
[v1] Fri, 3 May 2024 09:41:39 GMT (152kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2405.01964

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Understanding LLMs Requires More Than Statistical Generalization

Submission history