Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Liu, Tianyi; Li, Yan; Wei, Song; Zhou, Enlu; Zhao, Tuo

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2102

Computer Science > Machine Learning

Title: Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Authors: Tianyi Liu, Yan Li, Song Wei, Enlu Zhou, Tuo Zhao

(Submitted on 24 Feb 2021)

Abstract: Numerous empirical evidences have corroborated the importance of noise in nonconvex optimization problems. The theory behind such empirical observations, however, is still largely unknown. This paper studies this fundamental problem through investigating the nonconvex rectangular matrix factorization problem, which has infinitely many global minima due to rotation and scaling invariance. Hence, gradient descent (GD) can converge to any optimum, depending on the initialization. In contrast, we show that a perturbed form of GD with an arbitrary initialization converges to a global optimum that is uniquely determined by the injected noise. Our result implies that the noise imposes implicit bias towards certain optima. Numerical experiments are provided to support our theory.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2102.12430 [cs.LG]
	(or arXiv:2102.12430v1 [cs.LG] for this version)

Submission history

From: Tianyi Liu [view email]
[v1] Wed, 24 Feb 2021 17:50:17 GMT (12524kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2102.12430

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix Factorization

Submission history