We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Optimization and Control

Title: Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Abstract: Asynchronous parallel optimization algorithms for solving large-scale machine learning problems have drawn significant attention from academia to industry recently. This paper proposes a novel algorithm, decoupled asynchronous proximal stochastic gradient descent (DAP-SGD), to minimize an objective function that is the composite of the average of multiple empirical losses and a regularization term. Unlike the traditional asynchronous proximal stochastic gradient descent (TAP-SGD) in which the master carries much of the computation load, the proposed algorithm off-loads the majority of computation tasks from the master to workers, and leaves the master to conduct simple addition operations. This strategy yields an easy-to-parallelize algorithm, whose performance is justified by theoretical convergence analyses. To be specific, DAP-SGD achieves an $O(\log T/T)$ rate when the step-size is diminishing and an ergodic $O(1/\sqrt{T})$ rate when the step-size is constant, where $T$ is the number of total iterations.
Comments: 19 pages
Subjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:1605.06619 [math.OC]
  (or arXiv:1605.06619v1 [math.OC] for this version)

Submission history

From: Yitan Li [view email]
[v1] Sat, 21 May 2016 10:27:50 GMT (38kb)

Link back to: arXiv, form interface, contact.