References & Citations
Mathematics > Optimization and Control
Title: Linear Regularizers Enforce the Strict Saddle Property
(Submitted on 18 May 2022 (v1), last revised 2 Jun 2022 (this version, v2))
Abstract: Satisfaction of the strict saddle property has become a standard assumption in non-convex optimization, and it ensures that gradient descent will almost always escape saddle points. However, functions exist in machine learning that do not satisfy this property, such as the loss function of a neural network with at least two hidden layers. Gradient descent may converge to non-strict saddle points of such functions, and there do not currently exist any first-order methods that reliably escape non-strict saddle points. To address this need, we demonstrate that regularizing a function with a linear term enforces the strict saddle property, and we provide justification for only regularizing locally, i.e., when the norm of the gradient falls below a certain threshold. We analyze bifurcations that may result from this form of regularization, and then we provide a selection rule for regularizers that depends only on the gradient of an objective function. This rule is shown to guarantee that gradient descent will escape the neighborhoods around a broad class of non-strict saddle points. This behavior is demonstrated on common examples of non-strict saddle points, and numerical results are provided from the training of a neural network.
Submission history
From: Matthew Ubl [view email][v1] Wed, 18 May 2022 18:21:51 GMT (399kb,D)
[v2] Thu, 2 Jun 2022 13:18:58 GMT (413kb,D)
Link back to: arXiv, form interface, contact.