We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Topology and Geometry of Deep Rectified Network Optimization Landscapes

Abstract: The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem. Some insights were recently gained using spin glass models and mean-field approximations, but at the expense of strongly simplifying the nonlinear nature of the model.
In this work, we do not make any such assumption and study conditions on the data distribution and model architecture that prevent the existence of bad local minima. Our theoretical work quantifies and formalizes two important \emph{folklore} facts: (i) the landscape of deep linear networks has a radically different topology from that of deep half-rectified ones, and (ii) that the energy landscape in the non-linear case is fundamentally controlled by the interplay between the smoothness of the data distribution and model over-parametrization. These results are in accordance with empirical practice and recent literature. %Together with %recent results that rigorously establish that no gradient descent can %get stuck on saddle points, we conclude that gradient descent converges %to a global optimum in deep rectified networks.
The conditioning of gradient descent is the next challenge we address. We study this question through the geometry of the level sets, and we introduce an algorithm to efficiently estimate the regularity of such sets on large-scale networks. Our empirical results show that these level sets remain connected throughout all the learning phase, suggesting a near convex behavior, but they become exponentially more curvy as the energy level decays, in accordance to what is observed in practice with very low curvature attractors.
Comments: 24 Pages (12 main + 12 Appendix), 4 Figures, 1 Table, Submitted to ICLR for review
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:1611.01540 [stat.ML]
  (or arXiv:1611.01540v1 [stat.ML] for this version)

Submission history

From: Daniel Freeman [view email]
[v1] Fri, 4 Nov 2016 21:17:42 GMT (389kb,D)
[v2] Sun, 20 Nov 2016 00:26:16 GMT (3055kb,D)
[v3] Sat, 25 Mar 2017 04:17:46 GMT (3069kb,D)
[v4] Thu, 1 Jun 2017 19:46:41 GMT (3100kb,D)

Link back to: arXiv, form interface, contact.