We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Optimization and Control

Title: Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions

Abstract: We provide the first non-asymptotic analysis for finding stationary points of nonsmooth, nonconvex functions. In particular, we study the class of Hadamard semi-differentiable functions, perhaps the largest class of nonsmooth functions for which the chain rule of calculus holds. This class contains examples such as ReLU neural networks and others with non-differentiable activation functions. We first show that finding an $\epsilon$-stationary point with first-order methods is impossible in finite time. We then introduce the notion of $(\delta, \epsilon)$-stationarity, which allows for an $\epsilon$-approximate gradient to be the convex combination of generalized gradients evaluated at points within distance $\delta$ to the solution. We propose a series of randomized first-order methods and analyze their complexity of finding a $(\delta, \epsilon)$-stationary point. Furthermore, we provide a lower bound and show that our stochastic algorithm has min-max optimal dependence on $\delta$. Empirically, our methods perform well for training ReLU neural networks.
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
Cite as: arXiv:2002.04130 [math.OC]
  (or arXiv:2002.04130v3 [math.OC] for this version)

Submission history

From: Jingzhao Zhang [view email]
[v1] Mon, 10 Feb 2020 23:23:04 GMT (156kb,D)
[v2] Sun, 16 Feb 2020 14:11:35 GMT (156kb,D)
[v3] Mon, 29 Jun 2020 14:53:13 GMT (997kb,D)

Link back to: arXiv, form interface, contact.