We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: SGD May Never Escape Saddle Points

Abstract: Stochastic gradient descent (SGD) has been deployed to solve highly non-linear and non-convex machine learning problems such as the training of deep neural networks. However, previous works on SGD often rely on restrictive and unrealistic assumptions about the nature of noise in SGD. In this work, we mathematically construct examples that defy previous understandings of SGD. For example, our constructions show that: (1) SGD may converge to a local maximum; (2) SGD may escape a saddle point arbitrarily slowly; (3) SGD may prefer sharp minima over the flat ones; and (4) AMSGrad may converge to a local maximum. We also show the relevance of our result to deep learning by presenting a minimal neural network example. Our result suggests that the noise structure of SGD might be more important than the loss landscape in neural network training and that future research should focus on deriving the actual noise structure in deep learning.
Comments: Typoes fixed
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as: arXiv:2107.11774 [cs.LG]
  (or arXiv:2107.11774v2 [cs.LG] for this version)

Submission history

From: Liu Ziyin [view email]
[v1] Sun, 25 Jul 2021 10:12:18 GMT (264kb,D)
[v2] Wed, 22 Sep 2021 02:20:17 GMT (362kb,D)

Link back to: arXiv, form interface, contact.