References & Citations
Computer Science > Distributed, Parallel, and Cluster Computing
Title: Bottleneck Time Minimization for Distributed Iterative Processes: Speeding Up Gossip-Based Federated Learning on Networked Computers
(Submitted on 29 Jun 2021)
Abstract: We present a novel task scheduling scheme for accelerating computational applications involving distributed iterative processes that are executed on networked computing resources. Such an application consists of multiple tasks, each of which outputs data at each iteration to be processed by neighboring tasks; these dependencies between the tasks can be represented as a directed graph. We first mathematically formulate the problem as a Binary Quadratic Program (BQP), accounting for both computation and communication costs. We show that the problem is NP-hard. We then relax the problem as a Semi-Definite Program (SDP) and utilize a randomized rounding technique based on sampling from a suitably-formulated multi-variate Gaussian distribution. Furthermore, we derive the expected value of bottleneck time. Finally, we apply our proposed scheme on gossip-based federated learning as an application of iterative processes. Through numerical evaluations on the MNIST and CIFAR-10 datasets, we show that our proposed approach outperforms well-known scheduling techniques from distributed computing. In particular, for arbitrary settings, we show that it reduces bottleneck time by $91\%$ compared to HEFT and $84\%$ compared to throughput HEFT.
Link back to: arXiv, form interface, contact.