Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Is Local SGD Better than Minibatch SGD?
(Submitted on 18 Feb 2020 (v1), last revised 20 Jul 2020 (this version, v2))
Abstract: We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.
Submission history
From: Blake Woodworth [view email][v1] Tue, 18 Feb 2020 19:22:43 GMT (259kb,D)
[v2] Mon, 20 Jul 2020 15:47:48 GMT (254kb,D)
Link back to: arXiv, form interface, contact.