We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: SGD with a Constant Large Learning Rate Can Converge to Local Maxima

Abstract: Previous works on stochastic gradient descent (SGD) often focus on its success. In this work, we construct worst-case optimization problems illustrating that, when not in the regimes that the previous works often assume, SGD can exhibit many strange and potentially undesirable behaviors. Specifically, we construct landscapes and data distributions such that (1) SGD converges to local maxima, (2) SGD escapes saddle points arbitrarily slowly, (3) SGD prefers sharp minima over flat ones, and (4) AMSGrad converges to local maxima. We also realize results in a minimal neural network-like example. Our results highlight the importance of simultaneously analyzing the minibatch sampling, discrete-time updates rules, and realistic landscapes to understand the role of SGD in deep learning.
Comments: ICLR 2022 Spotlight
Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as: arXiv:2107.11774 [cs.LG]
  (or arXiv:2107.11774v3 [cs.LG] for this version)

Submission history

From: Liu Ziyin [view email]
[v1] Sun, 25 Jul 2021 10:12:18 GMT (264kb,D)
[v2] Wed, 22 Sep 2021 02:20:17 GMT (362kb,D)
[v3] Mon, 14 Mar 2022 02:52:24 GMT (560kb,D)

Link back to: arXiv, form interface, contact.