We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

Abstract: Generalization beyond a training dataset is a main goal of machine learning, but theoretical understanding of generalization remains an open problem for many models. The need for a new theory is exacerbated by recent observations in deep neural networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. In this paper, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also includes infinitely overparameterized neural networks trained with gradient descent. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel or data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with "simple functions", which are identified by solving a kernel eigenfunction problem on the data distribution. This notion of simplicity allows us to characterize whether a kernel is compatible with a learning task, facilitating good generalization performance from a small number of training examples. We show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks. To further understand these phenomena, we turn to the broad class of rotation invariant kernels, which is relevant to training deep neural networks in the infinite-width limit, and present a detailed mathematical analysis of them when data is drawn from a spherically symmetric distribution and the number of input dimensions is large.
Comments: Accepted for publication in Nature Communications. SI Eq.71 is corrected
Subjects: Machine Learning (stat.ML); Disordered Systems and Neural Networks (cond-mat.dis-nn); Machine Learning (cs.LG)
DOI: 10.1038/s41467-021-23103-1
Cite as: arXiv:2006.13198 [stat.ML]
  (or arXiv:2006.13198v6 [stat.ML] for this version)

Submission history

From: Abdulkadir Canatar [view email]
[v1] Tue, 23 Jun 2020 17:53:11 GMT (2677kb,D)
[v2] Tue, 7 Jul 2020 02:13:57 GMT (3022kb,D)
[v3] Sat, 31 Oct 2020 22:41:17 GMT (3389kb,D)
[v4] Tue, 23 Feb 2021 01:30:51 GMT (4508kb,D)
[v5] Mon, 19 Apr 2021 04:13:23 GMT (7712kb,D)
[v6] Fri, 4 Feb 2022 21:25:17 GMT (7712kb,D)

Link back to: arXiv, form interface, contact.