We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: A Unified and Refined Convergence Analysis for Non-Convex Decentralized Learning

Abstract: We study the consensus decentralized optimization problem where the objective function is the average of $n$ agents private non-convex cost functions; moreover, the agents can only communicate to their neighbors on a given network topology. The stochastic online setting is considered in this paper where each agent can only access a noisy estimate of its gradient. Many decentralized methods can solve such problems including EXTRA, Exact-Diffusion/D$^2$, and gradient-tracking. Unlike the famed $\small \text{DSGD}$ algorithm, these methods have been shown to be robust to the heterogeneity of the local cost functions. However, the established convergence rates for these methods indicate that their sensitivity to the network topology is worse than $\small \text{DSGD}$. Such theoretical results imply that these methods can perform much worse than $\small \text{DSGD}$ over sparse networks, which, however, contradicts empirical experiments where $\small \text{DSGD}$ is observed to be more sensitive to the network topology.
In this work, we study a general stochastic unified decentralized algorithm ($\small\textbf{SUDA}$) that includes the above methods as special cases. We establish the convergence of $\small\textbf{SUDA}$ under both non-convex and the Polyak-Lojasiewicz condition settings. Our results provide improved network topology dependent bounds for these methods (such as Exact-Diffusion/D$^2$ and gradient-tracking) compared with existing literature. Moreover, our result shows that these method are less sensitive to the network topology compared to $\small \text{DSGD}$, which agrees with numerical experiments.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Cite as: arXiv:2110.09993 [cs.DC]
  (or arXiv:2110.09993v1 [cs.DC] for this version)

Submission history

From: Sulaiman Alghunaim [view email]
[v1] Tue, 19 Oct 2021 14:04:26 GMT (1197kb,D)

Link back to: arXiv, form interface, contact.