We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Optimization and Control

Title: Linear Regularizers Enforce the Strict Saddle Property

Abstract: Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that gradient descent will almost always escape saddle points. However, functions exist in machine learning that do not satisfy this property, such as the loss function of a neural network with at least two hidden layers. Gradient descent may converge to non-strict saddle points of such functions, and there do not currently exist any first-order methods that reliably escape non-strict saddle points. To address this need, we demonstrate that regularizing a function with a linear term enforces the strict saddle property, and we provide justification for only regularizing locally, i.e., when the norm of the gradient falls below a certain threshold. We analyze bifurcations that may result from this form of regularization, and then we provide a selection rule for regularizers that depends only on the gradient of an objective function. This rule is shown to guarantee that gradient descent will escape the neighborhoods around a broad class of non-strict saddle points. This behavior is demonstrated on common examples of non-strict saddle points, and numerical results are provided from the training of a neural network.
Comments: 11 pages, 6 figures
Subjects: Optimization and Control (math.OC)
Cite as: arXiv:2205.09160 [math.OC]
  (or arXiv:2205.09160v2 [math.OC] for this version)

Submission history

From: Matthew Ubl [view email]
[v1] Wed, 18 May 2022 18:21:51 GMT (399kb,D)
[v2] Thu, 2 Jun 2022 13:18:58 GMT (413kb,D)

Link back to: arXiv, form interface, contact.