Understanding Deep Neural Networks with Rectified Linear Units

Arora, Raman; Basu, Amitabh; Mianjy, Poorya; Mukherjee, Anirbit

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1611

Computer Science > Machine Learning

Title: Understanding Deep Neural Networks with Rectified Linear Units

Authors: Raman Arora, Amitabh Basu, Poorya Mianjy, Anirbit Mukherjee

(Submitted on 4 Nov 2016 (v1), revised 11 Nov 2016 (this version, v2), latest version 28 Feb 2018 (v6))

Abstract: In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give the first-ever polynomial (in data size and circuit size) time algorithm to train a ReLU DNN with one hidden layer and a single input to global optimality. This follows from our complete characterization of the ReLU DNN function class whereby we show that a $\mathbb{R}^n \to \mathbb{R}$ function is representable by a ReLU DNN if and only if it is a continuous piecewise linear function. The main tool used to prove this characterization is an elegant result from tropical geometry. Further, for the $n=1$ case, we show that a single hidden layer suffices to express all piecewise linear functions, and we give tight bounds for the size of such a ReLU DNN.We follow up with gap results showing that there is a smoothly parameterized family of $\mathbb{R}\to \mathbb{R}$ "hard" functions that lead to an exponential blow-up in size, if the number of layers is decreased by a small amount. An example consequence of our gap theorem is that for every natural number $N$, there exists a function representable by a ReLU DNN with depth $N^2+1$ and total size $N^3$, such that any ReLU DNN with depth at most $N + 1$ will require at least $\frac12N^{N+1}-1$ total nodes.
Finally, we construct a family of $\mathbb{R}^n\to \mathbb{R}$ functions for $n\geq 2$ (also smoothly parameterized), whose number of affine pieces scales exponentially with the dimension $n$ at any fixed size and depth. To the best of our knowledge, such a construction with exponential dependence on $n$ has not been achieved by previous families of "hard" functions in the neural nets literature.

Comments:	In this updated version of the paper we incorporate a suggestion by Adam Klivans which now makes our training algorithm run in time polynomial in the data size as well as the circuit size
Subjects:	Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn); Artificial Intelligence (cs.AI); Computational Complexity (cs.CC); Machine Learning (stat.ML)
Cite as:	arXiv:1611.01491 [cs.LG]
	(or arXiv:1611.01491v2 [cs.LG] for this version)

Submission history

From: Anirbit Mukherjee [view email]
[v1] Fri, 4 Nov 2016 18:54:50 GMT (545kb,D)
[v2] Fri, 11 Nov 2016 20:25:56 GMT (546kb,D)
[v3] Sat, 26 Nov 2016 17:38:11 GMT (545kb,D)
[v4] Mon, 29 May 2017 20:06:50 GMT (536kb,D)
[v5] Tue, 18 Jul 2017 17:17:14 GMT (536kb,D)
[v6] Wed, 28 Feb 2018 02:23:47 GMT (571kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1611.01491v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Understanding Deep Neural Networks with Rectified Linear Units

Submission history