Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes

Dohmatob, Elvis

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2106

Statistics > Machine Learning

Title: Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes

Authors: Elvis Dohmatob

(Submitted on 4 Jun 2021)

Abstract: This work studies the (non)robustness of two-layer neural networks in various high-dimensional linearized regimes. We establish fundamental trade-offs between memorization and robustness, as measured by the Sobolev-seminorm of the model w.r.t the data distribution, i.e the square root of the average squared $L_2$-norm of the gradients of the model w.r.t the its input. More precisely, if $n$ is the number of training examples, $d$ is the input dimension, and $k$ is the number of hidden neurons in a two-layer neural network, we prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded by (i) $\sqrt{n}$ in case of infinite-width random features (RF) or neural tangent kernel (NTK) with $d \gtrsim n$; (ii) $\sqrt{n}$ in case of finite-width RF with proportionate scaling of $d$ and $k$; and (iii) $\sqrt{n/k}$ in case of finite-width NTK with proportionate scaling of $d$ and $k$. Moreover, all of these lower-bounds are tight: they are attained by the min-norm / least-squares interpolator (when $n$, $d$, and $k$ are in the appropriate interpolating regime). All our results hold as soon as data is log-concave isotropic, and there is label-noise, i.e the target variable is not a deterministic function of the data / features. We empirically validate our theoretical results with experiments. Accidentally, these experiments also reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2106.02630 [stat.ML]
	(or arXiv:2106.02630v1 [stat.ML] for this version)

Submission history

From: Elvis Dohmatob [view email]
[v1] Fri, 4 Jun 2021 17:52:50 GMT (3173kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2106.02630

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes

Submission history