We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Theory

Title: A Fundamental Tradeoff between Computation and Communication in Distributed Computing

Abstract: How can we optimally trade extra computing power to reduce the communication load in distributed computing? We answer this question by characterizing a fundamental tradeoff between computation and communication in distributed computing, i.e., the two are inversely proportional to each other.
More specifically, a general distributed computing framework, motivated by commonly used structures like MapReduce, is considered, where the overall computation is decomposed into computing a set of "Map" and "Reduce" functions distributedly across multiple computing nodes. A coded scheme, named "Coded Distributed Computing" (CDC), is proposed to demonstrate that increasing the computation load of the Map functions by a factor of $r$ (i.e., evaluating each function at $r$ carefully chosen nodes) can create novel coding opportunities that reduce the communication load by the same factor.
An information-theoretic lower bound on the communication load is also provided, which matches the communication load achieved by the CDC scheme. As a result, the optimal computation-communication tradeoff in distributed computing is exactly characterized.
Finally, the coding techniques of CDC is applied to the Hadoop TeraSort benchmark to develop a novel CodedTeraSort algorithm, which is empirically demonstrated to speed up the overall job execution by $1.97\times$ - $3.39\times$, for typical settings of interest.
Comments: To appear in IEEE Transactions on Information Theory
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:1604.07086 [cs.IT]
  (or arXiv:1604.07086v2 [cs.IT] for this version)

Submission history

From: Songze Li [view email]
[v1] Sun, 24 Apr 2016 22:10:44 GMT (297kb,D)
[v2] Sat, 23 Sep 2017 02:47:20 GMT (389kb,D)

Link back to: arXiv, form interface, contact.