Accumulated Gradient Normalization

Hermans, Joeri; Spanakis, Gerasimos; Möckel, Rico

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1710

Statistics > Machine Learning

Title: Accumulated Gradient Normalization

Authors: Joeri Hermans, Gerasimos Spanakis, Rico Möckel

(Submitted on 6 Oct 2017)

Abstract: This work addresses the instability in asynchronous data parallel optimization. It does so by introducing a novel distributed optimizer which is able to efficiently optimize a centralized model under communication constraints. The optimizer achieves this by pushing a normalized sequence of first-order gradients to a parameter server. This implies that the magnitude of a worker delta is smaller compared to an accumulated gradient, and provides a better direction towards a minimum compared to first-order gradients, which in turn also forces possible implicit momentum fluctuations to be more aligned since we make the assumption that all workers contribute towards a single minima. As a result, our approach mitigates the parameter staleness problem more effectively since staleness in asynchrony induces (implicit) momentum, and achieves a better convergence rate compared to other optimizers such as asynchronous EASGD and DynSGD, which we show empirically.

Comments:	16 pages, 12 figures, ACML2017
Subjects:	Machine Learning (stat.ML); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:1710.02368 [stat.ML]
	(or arXiv:1710.02368v1 [stat.ML] for this version)

Submission history

From: Joeri Hermans [view email]
[v1] Fri, 6 Oct 2017 12:32:16 GMT (8344kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1710.02368

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Accumulated Gradient Normalization

Submission history