We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Creating Robust Deep Neural Networks With Coded Distributed Computing for IoT Systems

Abstract: The increasing interest in serverless computation and ubiquitous wireless networks has led to numerous connected devices in our surroundings. Among such devices, IoT devices have access to an abundance of raw data, but their inadequate resources in computing limit their capabilities. Specifically, with the emergence of deep neural networks (DNNs), not only is the demand for the computing power of IoT devices increasing but also privacy concerns are pushing computations towards the edge. To overcome inadequate resources, several studies have proposed the distribution of work among IoT devices. These promising methods harvest the aggregated computing power of the idle IoT devices in an environment. However, since such a distributed system strongly relies on each device, unstable latencies, and intermittent failures, the common characteristics of IoT devices and wireless networks, cause high recovery overheads. To reduce this overhead, we propose a novel robustness method with a close-to-zero recovery latency for DNN computations. Our solution never loses a request or spends time recovering from a failure. To do so, first, we analyze the underlying matrix-matrix computations affected by distribution. Then, we introduce a new coded distributed computing (CDC) method that has a constant cost with the increasing number of devices, unlike the linear cost of modular redundancies. Moreover, our method is applied in the library level, without requiring extensive changes to the program, while still ensuring a balanced work assignment during distribution. To illustrate our method, we perform experiments with distributed systems comprising up to 12 Raspberry Pis.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2104.04447 [cs.DC]
  (or arXiv:2104.04447v1 [cs.DC] for this version)

Submission history

From: Ramyad Hadidi [view email]
[v1] Fri, 9 Apr 2021 15:52:35 GMT (4410kb,D)

Link back to: arXiv, form interface, contact.