Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Li, Zhiyuan; Zhang, Yi; Arora, Sanjeev

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2010

Computer Science > Machine Learning

Title: Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Authors: Zhiyuan Li, Yi Zhang, Sanjeev Arora

(Submitted on 16 Oct 2020 (v1), last revised 4 May 2021 (this version, v2))

Abstract: Convolutional neural networks often dominate fully-connected counterparts in generalization performance, especially on image classification tasks. This is often explained in terms of 'better inductive bias'. However, this has not been made mathematically rigorous, and the hurdle is that the fully connected net can always simulate the convolutional net (for a fixed task). Thus the training algorithm plays a role. The current work describes a natural task on which a provable sample complexity gap can be shown, for standard training algorithms. We construct a single natural distribution on $\mathbb{R}^d\times\{\pm 1\}$ on which any orthogonal-invariant algorithm (i.e. fully-connected networks trained with most gradient-based methods from gaussian initialization) requires $\Omega(d^2)$ samples to generalize while $O(1)$ samples suffice for convolutional architectures. Furthermore, we demonstrate a single target function, learning which on all possible distributions leads to an $O(1)$ vs $\Omega(d^2/\varepsilon)$ gap. The proof relies on the fact that SGD on fully-connected network is orthogonal equivariant. Similar results are achieved for $\ell_2$ regression and adaptive training algorithms, e.g. Adam and AdaGrad, which are only permutation equivariant.

Comments:	24 pages, 1 figure; Accepted by ICLR 2021
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2010.08515 [cs.LG]
	(or arXiv:2010.08515v2 [cs.LG] for this version)

Submission history

From: Zhiyuan Li [view email]
[v1] Fri, 16 Oct 2020 17:15:39 GMT (418kb,D)
[v2] Tue, 4 May 2021 17:54:15 GMT (10224kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.08515

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Submission history