Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Frei, Spencer; Cao, Yuan; Gu, Quanquan

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2101

Computer Science > Machine Learning

Title: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Authors: Spencer Frei, Yuan Cao, Quanquan Gu

(Submitted on 4 Jan 2021 (v1), last revised 15 Feb 2021 (this version, v3))

Abstract: We consider a one-hidden-layer leaky ReLU network of arbitrary width trained by stochastic gradient descent (SGD) following an arbitrary initialization. We prove that SGD produces neural networks that have classification accuracy competitive with that of the best halfspace over the distribution for a broad class of distributions that includes log-concave isotropic and hard margin distributions. Equivalently, such networks can generalize when the data distribution is linearly separable but corrupted with adversarial label noise, despite the capacity to overfit. To the best of our knowledge, this is the first work to show that overparameterized neural networks trained by SGD can generalize when the data is corrupted with adversarial label noise.

Comments:	30 pages, 10 figures
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:2101.01152 [cs.LG]
	(or arXiv:2101.01152v3 [cs.LG] for this version)

Submission history

From: Quanquan Gu [view email]
[v1] Mon, 4 Jan 2021 18:32:49 GMT (6184kb,D)
[v2] Thu, 14 Jan 2021 18:57:11 GMT (8084kb,D)
[v3] Mon, 15 Feb 2021 18:57:47 GMT (9903kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2101.01152

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Provable Generalization of SGD-trained Neural Networks of Any Width in the Presence of Adversarial Label Noise

Submission history