Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

Candela, Rosa; Franzese, Giulio; Filippone, Maurizio; Michiardi, Pietro

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1910

Computer Science > Machine Learning

Title: Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

Authors: Rosa Candela, Giulio Franzese, Maurizio Filippone, Pietro Michiardi

(Submitted on 21 Oct 2019 (v1), last revised 18 Jan 2021 (this version, v3))

Abstract: Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. In this work we study the performance of asynchronous, distributed settings, when applying sparsification, a technique used to reduce communication overheads. In particular, for the first time in an asynchronous, non-convex setting, we theoretically prove that, in presence of staleness, sparsification does not harm SGD performance: the ergodic convergence rate matches the known result of standard SGD, that is $\mathcal{O} \left( 1/\sqrt{T} \right)$. We also carry out an empirical study to complement our theory, and confirm that the effects of sparsification on the convergence rate are negligible, when compared to 'vanilla' SGD, even in the challenging scenario of an asynchronous, distributed system.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1910.09466 [cs.LG]
	(or arXiv:1910.09466v3 [cs.LG] for this version)

Submission history

From: Giulio Franzese [view email]
[v1] Mon, 21 Oct 2019 15:51:16 GMT (1219kb,D)
[v2] Thu, 9 Jul 2020 15:01:06 GMT (587kb)
[v3] Mon, 18 Jan 2021 08:41:09 GMT (630kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1910.09466

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD

Submission history