We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Optimization and Control

Title: Computing the Variance of Shuffling Stochastic Gradient Algorithms via Power Spectral Density Analysis

Abstract: When solving finite-sum minimization problems, two common alternatives to stochastic gradient descent (SGD) with theoretical benefits are random reshuffling (SGD-RR) and shuffle-once (SGD-SO), in which functions are sampled in cycles without replacement. Under a convenient stochastic noise approximation which holds experimentally, we study the stationary variances of the iterates of SGD, SGD-RR and SGD-SO, whose leading terms decrease in this order, and obtain simple approximations. To obtain our results, we study the power spectral density of the stochastic gradient noise sequences. Our analysis extends beyond SGD to SGD with momentum and to the stochastic Nesterov's accelerated gradient method. We perform experiments on quadratic objective functions to test the validity of our approximation and the correctness of our findings.
Comments: The code can be found at \url{this https URL}
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2206.00632 [math.OC]
  (or arXiv:2206.00632v1 [math.OC] for this version)

Submission history

From: Carles Domingo-Enrich [view email]
[v1] Wed, 1 Jun 2022 17:08:04 GMT (3725kb,D)

Link back to: arXiv, form interface, contact.