A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

Sun, Zhenyu; Wei, Ermin

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2403

Computer Science > Machine Learning

Title: A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

Authors: Zhenyu Sun, Ermin Wei

(Submitted on 22 Mar 2024)

Abstract: Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grows with respect to the gradient norm along the training trajectory. Motivated by this phenomenon, the recently introduced $(L_0, L_1)$-smoothness is a more general notion, compared to traditional $L$-smoothness, that captures such positive relationship between smoothness and gradient norm. Under this type of non-uniform smoothness, existing literature has designed stochastic first-order algorithms by utilizing gradient clipping techniques to obtain the optimal $\mathcal{O}(\epsilon^{-3})$ sample complexity for finding an $\epsilon$-approximate first-order stationary solution. Nevertheless, the studies of quasi-Newton methods are still lacking. Considering higher accuracy and more robustness for quasi-Newton methods, in this paper we propose a fast stochastic quasi-Newton method when there exists non-uniformity in smoothness. Leveraging gradient clipping and variance reduction, our algorithm can achieve the best-known $\mathcal{O}(\epsilon^{-3})$ sample complexity and enjoys convergence speedup with simple hyperparameter tuning. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2403.15244 [cs.LG]
	(or arXiv:2403.15244v1 [cs.LG] for this version)

Submission history

From: Zhenyu Sun [view email]
[v1] Fri, 22 Mar 2024 14:40:29 GMT (351kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.15244

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

Submission history