Current browse context:
math.OC
Change to browse by:
References & Citations
Mathematics > Optimization and Control
Title: On the Influence of Momentum Acceleration on Online Learning
(Submitted on 14 Mar 2016 (this version), latest version 12 Oct 2016 (v4))
Abstract: The article examines in some detail the convergence rate and mean-square-error performance of momentum stochastic gradient methods in the constant step-size case and slow adaptation regime. The results establish that momentum methods are equivalent to the standard stochastic gradient method with a re-scaled (larger) step-size value. The size of the re-scaling is determined by the value of the momentum parameter. The equivalence result is established for all time instants and not only in steady-state. The analysis is carried out for general risk functions, and is not limited to quadratic risks. One notable conclusion is that the well-known benefits of momentum constructions for deterministic optimization problems do not necessarily carry over to the stochastic (online) setting when adaptation becomes necessary and when the true gradient vectors are not known beforehand. The analysis also suggests a method to retain some of the advantages of the momentum construction by employing a decaying momentum parameter, as opposed to a decaying step-size. In this way, the enhanced convergence rate during the initial stages of adaptation is preserved without the often-observed degradation in MSD performance.
Submission history
From: Kun Yuan [view email][v1] Mon, 14 Mar 2016 05:05:54 GMT (2598kb)
[v2] Tue, 29 Mar 2016 06:27:47 GMT (2598kb)
[v3] Mon, 1 Aug 2016 23:18:27 GMT (318kb)
[v4] Wed, 12 Oct 2016 05:19:07 GMT (551kb)
Link back to: arXiv, form interface, contact.