We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 27 Jan 23

[1]  arXiv:2301.10949 [pdf, other]
Title: Compatibility of convergence algorithms for autonomous mobile robots
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

We consider several convergence problems for autonomous mobile robots under the $\cal SSYNC$ model. Let $\Phi$ and $\Pi $ be a set of target functions and a problem, respectively. If the robots whose target functions are chosen from $\Phi$ always solve $\Pi$, we say that $\Phi$ is compatible with respect to $\Pi$. If $\Phi$ is compatible with respect to $\Pi$, every target function $\phi \in \Phi$ is an algorithm for $\Pi$. Note that even if both $\phi$ and $\phi'$ are algorithms for $\Pi$, $\{ \phi, \phi' \}$ may not be compatible with respect to $\Pi$. We investigate, the convergence, the fault tolerant ($n,f$)-convergence (FC($f$)), the fault tolerant ($n,f$)-convergence to $f$ points (FC($f$)-PO), the fault tolerant ($n,f$)-convergence to a convex $f$-gon (FC($f$)-CP), and the gathering problem, assuming crash failures. We classify these problems from the viewpoint of compatibility; the group of the convergence, FC(1), FC(1)-PO and FC($f$)-CP, and the group of the gathering and FC($f$)-PO for $f \geq 2$ have completely opposite properties. FC($f$) for $f \geq 2$ is placed in between.

[2]  arXiv:2301.11049 [pdf, other]
Title: Odyssey: A Journey in the Land of Distributed Data Series Similarity Search
Comments: PVLDB 2023
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)

This paper presents Odyssey, a novel distributed data-series processing framework that efficiently addresses the critical challenges of exhibiting good speedup and ensuring high scalability in data series processing by taking advantage of the full computational capacity of modern clusters comprised of multi-core servers. Odyssey addresses a number of challenges in designing efficient and highly scalable distributed data series index, including efficient scheduling, and load-balancing without paying the prohibitive cost of moving data around. It also supports a flexible partial replication scheme, which enables Odyssey to navigate through a fundamental trade-off between data scalability and good performance during query answering. Through a wide range of configurations and using several real and synthetic datasets, our experimental analysis demonstrates that Odyssey achieves its challenging goals.

[3]  arXiv:2301.11128 [pdf, other]
Title: A Cloud-Edge Continuum Experimental Methodology Applied to a 5G Core Study
Comments: Published in Springer Nature - Research Book Series: Transactions on Computational Science & Computational Intelligence this https URL (ISSN: 2569-7072)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

There is an increasing interest in extending traditional cloud-native technologies, such as Kubernetes, outside the data center to build a continuum towards the edge and between. However, traditional resource orchestration algorithms do not work well in this case, and it is also difficult to test applications for a heterogeneous cloud infrastructure without actually building it. To address these challenges, we propose a new methodology to aid in deploying, testing, and analyzing the effects of microservice placement and scheduling in a heterogeneous Cloud environment. With this methodology, we can investigate any combination of deployment scenarios and monitor metrics in accordance with the placement of microservices in the cloud-edge continuum. Edge devices may be simulated, but as we use Kubernetes, any device which can be attached to a Kubernetes cluster could be used. In order to demonstrate our methodology, we have applied it to the problem of network function placement of an open-source 5G core implementation.

Cross-lists for Fri, 27 Jan 23

[4]  arXiv:2301.10879 (cross-list from cs.LG) [pdf, other]
Title: SuperFed: Weight Shared Federated Learning
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Learning (FL) is a well-established technique for privacy preserving distributed training. Much attention has been given to various aspects of FL training. A growing number of applications that consume FL-trained models, however, increasingly operate under dynamically and unpredictably variable conditions, rendering a single model insufficient. We argue for training a global family of models cost efficiently in a federated fashion. Training them independently for different tradeoff points incurs $O(k)$ cost for any k architectures of interest, however. Straightforward applications of FL techniques to recent weight-shared training approaches is either infeasible or prohibitively expensive. We propose SuperFed - an architectural framework that incurs $O(1)$ cost to co-train a large family of models in a federated fashion by leveraging weight-shared learning. We achieve an order of magnitude cost savings on both communication and computation by proposing two novel training mechanisms: (a) distribution of weight-shared models to federated clients, (b) central aggregation of arbitrarily overlapping weight-shared model parameters. The combination of these mechanisms is shown to reach an order of magnitude (9.43x) reduction in computation and communication cost for training a $5*10^{18}$-sized family of models, compared to independently training as few as $k = 9$ DNNs without any accuracy loss.

[5]  arXiv:2301.10904 (cross-list from cs.CR) [pdf, other]
Title: GPU-based Private Information Retrieval for On-Device Machine Learning Inference
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

On-device machine learning (ML) inference can enable the use of private user data on user devices without remote servers. However, a pure on-device solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored on-device. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information during on-device ML inference. As off-the-shelf PIR algorithms are usually too computationally intensive to directly use for latency-sensitive inference tasks, we 1) develop a novel algorithm for accelerating PIR on GPUs, and 2) co-design PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than $20 \times$ over an optimized CPU PIR implementation, and our co-design techniques obtain over $5 \times$ additional throughput improvement at fixed model quality. Together, on various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100,000$ queries per second -- a $>100 \times$ throughput improvement over a naively implemented system -- while maintaining model accuracy, and limiting inference communication and response latency to within $300$KB and $<100$ms respectively.

[6]  arXiv:2301.10944 (cross-list from econ.GN) [pdf, other]
Title: A Framework of Transaction Packaging in High-throughput Blockchains
Subjects: General Economics (econ.GN); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)

We develop a model of coordination and allocation of decentralized multi-sided markets, in which our theoretical analysis is promisingly optimizing the decentralized transaction packaging process at high-throughput blockchains or Web 3.0 platforms. In contrast to the stylized centralized platform, the decentralized platform is powered by blockchain technology, which allows for secure and transparent Peer-to-Peer transactions among users. Traditional single-chain-based blockchains suffer from the well-known blockchain trilemma. Beyond the single-chain-based scheme, decentralized high-throughput blockchains adopt parallel protocols to reconcile the blockchain trilemma, implementing any tasking and desired allocation. However, unneglectable network latency may induce partial observability, resulting in incoordination and misallocation issues for the decentralized transaction packaging process at the current high-throughput blockchain protocols.
To address this problem, we consider a strategic coordination mechanism for the decentralized transaction packaging process by using a game-theoretic approach. Under a tractable two-period model, we find a Bayesian Nash equilibrium of the miner's strategic transaction packaging under partial observability. Along with novel algorithms for computing equilibrium payoffs, we show that the decentralized platform can achieve an efficient and stable market outcome. The model also highlights that the proposed mechanism can endogenously offer a base fee per gas without any restructuration of the initial blockchain transaction fee mechanism. The theoretical results that underlie the algorithms also imply bounds on the computational complexity of equilibrium payoffs.

[7]  arXiv:2301.11135 (cross-list from cs.LG) [pdf, other]
Title: FedHQL: Federated Heterogeneous Q-Learning
Comments: Preprint. Under review
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated Reinforcement Learning (FedRL) encourages distributed agents to learn collectively from each other's experience to improve their performance without exchanging their raw trajectories. The existing work on FedRL assumes that all participating agents are homogeneous, which requires all agents to share the same policy parameterization (e.g., network architectures and training configurations). However, in real-world applications, agents are often in disagreement about the architecture and the parameters, possibly also because of disparate computational budgets. Because homogeneity is not given in practice, we introduce the problem setting of Federated Reinforcement Learning with Heterogeneous And bLack-box agEnts (FedRL-HALE). We present the unique challenges this new setting poses and propose the Federated Heterogeneous Q-Learning (FedHQL) algorithm that principally addresses these challenges. We empirically demonstrate the efficacy of FedHQL in boosting the sample efficiency of heterogeneous agents with distinct policy parameterization using standard RL tasks.

[8]  arXiv:2301.11205 (cross-list from cs.DS) [pdf, ps, other]
Title: Deterministic Massively Parallel Symmetry Breaking for Sparse Graphs
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

We consider the problem of designing deterministic graph algorithms for the model of Massively Parallel Computation (MPC) that improve with the sparsity of the input graph, as measured by the notion of arboricity. For the problems of maximal independent set (MIS), maximal matching (MM), and vertex coloring, we improve the state of the art as follows. Let $\lambda$ denote the arboricity of the $n$-node input graph with maximum degree $\Delta$.
MIS and MM: We develop a deterministic fully-scalable algorithm that reduces the maximum degree to $poly(\lambda)$ in $O(\log \log n)$ rounds, improving and simplifying the randomized $O(\log \log n)$-round $poly(\max(\lambda, \log n))$-degree reduction of Ghaffari, Grunau, Jin [DISC'20]. Our approach when combined with the state-of-the-art $O(\log \Delta + \log \log n)$-round algorithm by Czumaj, Davies, Parter [SPAA'20, TALG'21] leads to an improved deterministic round complexity of $O(\log \lambda + \log \log n)$ for MIS and MM in low-space MPC.
We also extend above MIS and MM algorithms to work with linear global memory. Specifically, we show that both problems can be solved in deterministic time $O(\min(\log n, \log \lambda \cdot \log \log n))$, and even in $O(\log \log n)$ time for graphs with arboricity at most $\log^{O(1)} \log n$. In this setting, only a $O(\log^2 \log n)$-running time bound for trees was known due to Latypov and Uitto [ArXiv'21].
Vertex Coloring: We present a $O(1)$-round deterministic algorithm for the problem of $O(\lambda)$-coloring in linear-memory MPC with relaxed global memory of $n \cdot poly(\lambda)$ that solves the problem after just one single graph partitioning step. This matches the state-of-the-art randomized round complexity by Ghaffari and Sayyadi [ICALP'19] and improves upon the deterministic $O(\lambda^{\epsilon})$-round algorithm by Barenboim and Khazanov [CSR'18].

Replacements for Fri, 27 Jan 23

[9]  arXiv:2109.02340 (replaced) [pdf, other]
Title: Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream Processing
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[10]  arXiv:2205.11652 (replaced) [pdf, other]
Title: BeeGees: stayin' alive in chained BFT
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[ total of 10 entries: 1-10 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2301, contact, help  (Access key information)