A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

Warmuth, Manfred K.; Kotłowski, Wojciech; Amid, Ehsan

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2010

Computer Science > Machine Learning

Title: A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

Authors: Manfred K. Warmuth, Wojciech Kotłowski, Ehsan Amid

(Submitted on 16 Oct 2020)

Abstract: It was conjectured that any neural network of any structure and arbitrary differentiable transfer functions at the nodes cannot learn the following problem sample efficiently when trained with gradient descent: The instances are the rows of a $d$-dimensional Hadamard matrix and the target is one of the features, i.e. very sparse. We essentially prove this conjecture: We show that after receiving a random training set of size $k < d$, the expected square loss is still $1-\frac{k}{(d-1)}$. The only requirement needed is that the input layer is fully connected and the initial weight vectors of the input nodes are chosen from a rotation invariant distribution.
Surprisingly the same type of problem can be solved drastically more efficient by a simple 2-layer linear neural network in which the $d$ inputs are connected to the output node by chains of length 2 (Now the input layer has only one edge per input). When such a network is trained by gradient descent, then it has been shown that its expected square loss is $\frac{\log d}{k}$.
Our lower bounds essentially show that a sparse input layer is needed to sample efficiently learn sparse targets with gradient descent when the number of examples is less than the number of input features.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2010.08625 [cs.LG]
	(or arXiv:2010.08625v1 [cs.LG] for this version)

Submission history

From: Ehsan Amid [view email]
[v1] Fri, 16 Oct 2020 20:49:58 GMT (860kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.08625

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: A case where a spindly two-layer linear network whips any neural network with a fully connected input layer

Submission history