Distributed, Parallel, and Cluster Computing
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Fri, 27 Jan 23
 [1] arXiv:2301.10949 [pdf, other]

Title: Compatibility of convergence algorithms for autonomous mobile robotsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We consider several convergence problems for autonomous mobile robots under the $\cal SSYNC$ model. Let $\Phi$ and $\Pi $ be a set of target functions and a problem, respectively. If the robots whose target functions are chosen from $\Phi$ always solve $\Pi$, we say that $\Phi$ is compatible with respect to $\Pi$. If $\Phi$ is compatible with respect to $\Pi$, every target function $\phi \in \Phi$ is an algorithm for $\Pi$. Note that even if both $\phi$ and $\phi'$ are algorithms for $\Pi$, $\{ \phi, \phi' \}$ may not be compatible with respect to $\Pi$. We investigate, the convergence, the fault tolerant ($n,f$)convergence (FC($f$)), the fault tolerant ($n,f$)convergence to $f$ points (FC($f$)PO), the fault tolerant ($n,f$)convergence to a convex $f$gon (FC($f$)CP), and the gathering problem, assuming crash failures. We classify these problems from the viewpoint of compatibility; the group of the convergence, FC(1), FC(1)PO and FC($f$)CP, and the group of the gathering and FC($f$)PO for $f \geq 2$ have completely opposite properties. FC($f$) for $f \geq 2$ is placed in between.
 [2] arXiv:2301.11049 [pdf, other]

Title: Odyssey: A Journey in the Land of Distributed Data Series Similarity SearchComments: PVLDB 2023Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
This paper presents Odyssey, a novel distributed dataseries processing framework that efficiently addresses the critical challenges of exhibiting good speedup and ensuring high scalability in data series processing by taking advantage of the full computational capacity of modern clusters comprised of multicore servers. Odyssey addresses a number of challenges in designing efficient and highly scalable distributed data series index, including efficient scheduling, and loadbalancing without paying the prohibitive cost of moving data around. It also supports a flexible partial replication scheme, which enables Odyssey to navigate through a fundamental tradeoff between data scalability and good performance during query answering. Through a wide range of configurations and using several real and synthetic datasets, our experimental analysis demonstrates that Odyssey achieves its challenging goals.
 [3] arXiv:2301.11128 [pdf, other]

Title: A CloudEdge Continuum Experimental Methodology Applied to a 5G Core StudyComments: Published in Springer Nature  Research Book Series: Transactions on Computational Science & Computational Intelligence this https URL (ISSN: 25697072)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
There is an increasing interest in extending traditional cloudnative technologies, such as Kubernetes, outside the data center to build a continuum towards the edge and between. However, traditional resource orchestration algorithms do not work well in this case, and it is also difficult to test applications for a heterogeneous cloud infrastructure without actually building it. To address these challenges, we propose a new methodology to aid in deploying, testing, and analyzing the effects of microservice placement and scheduling in a heterogeneous Cloud environment. With this methodology, we can investigate any combination of deployment scenarios and monitor metrics in accordance with the placement of microservices in the cloudedge continuum. Edge devices may be simulated, but as we use Kubernetes, any device which can be attached to a Kubernetes cluster could be used. In order to demonstrate our methodology, we have applied it to the problem of network function placement of an opensource 5G core implementation.
Crosslists for Fri, 27 Jan 23
 [4] arXiv:2301.10879 (crosslist from cs.LG) [pdf, other]

Title: SuperFed: Weight Shared Federated LearningSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Federated Learning (FL) is a wellestablished technique for privacy preserving distributed training. Much attention has been given to various aspects of FL training. A growing number of applications that consume FLtrained models, however, increasingly operate under dynamically and unpredictably variable conditions, rendering a single model insufficient. We argue for training a global family of models cost efficiently in a federated fashion. Training them independently for different tradeoff points incurs $O(k)$ cost for any k architectures of interest, however. Straightforward applications of FL techniques to recent weightshared training approaches is either infeasible or prohibitively expensive. We propose SuperFed  an architectural framework that incurs $O(1)$ cost to cotrain a large family of models in a federated fashion by leveraging weightshared learning. We achieve an order of magnitude cost savings on both communication and computation by proposing two novel training mechanisms: (a) distribution of weightshared models to federated clients, (b) central aggregation of arbitrarily overlapping weightshared model parameters. The combination of these mechanisms is shown to reach an order of magnitude (9.43x) reduction in computation and communication cost for training a $5*10^{18}$sized family of models, compared to independently training as few as $k = 9$ DNNs without any accuracy loss.
 [5] arXiv:2301.10904 (crosslist from cs.CR) [pdf, other]

Title: GPUbased Private Information Retrieval for OnDevice Machine Learning InferenceAuthors: Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Minsoo Rhu, HsienHsin S. Lee, Vijay Janapa Reddi, GuYeon Wei, David Brooks, Edward SuhSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Ondevice machine learning (ML) inference can enable the use of private user data on user devices without remote servers. However, a pure ondevice solution to private ML inference is impractical for many applications that rely on embedding tables that are too large to be stored ondevice. To overcome this barrier, we propose the use of private information retrieval (PIR) to efficiently and privately retrieve embeddings from servers without sharing any private information during ondevice ML inference. As offtheshelf PIR algorithms are usually too computationally intensive to directly use for latencysensitive inference tasks, we 1) develop a novel algorithm for accelerating PIR on GPUs, and 2) codesign PIR with the downstream ML application to obtain further speedup. Our GPU acceleration strategy improves system throughput by more than $20 \times$ over an optimized CPU PIR implementation, and our codesign techniques obtain over $5 \times$ additional throughput improvement at fixed model quality. Together, on various ondevice ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100,000$ queries per second  a $>100 \times$ throughput improvement over a naively implemented system  while maintaining model accuracy, and limiting inference communication and response latency to within $300$KB and $<100$ms respectively.
 [6] arXiv:2301.10944 (crosslist from econ.GN) [pdf, other]

Title: A Framework of Transaction Packaging in Highthroughput BlockchainsSubjects: General Economics (econ.GN); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Computer Science and Game Theory (cs.GT)
We develop a model of coordination and allocation of decentralized multisided markets, in which our theoretical analysis is promisingly optimizing the decentralized transaction packaging process at highthroughput blockchains or Web 3.0 platforms. In contrast to the stylized centralized platform, the decentralized platform is powered by blockchain technology, which allows for secure and transparent PeertoPeer transactions among users. Traditional singlechainbased blockchains suffer from the wellknown blockchain trilemma. Beyond the singlechainbased scheme, decentralized highthroughput blockchains adopt parallel protocols to reconcile the blockchain trilemma, implementing any tasking and desired allocation. However, unneglectable network latency may induce partial observability, resulting in incoordination and misallocation issues for the decentralized transaction packaging process at the current highthroughput blockchain protocols.
To address this problem, we consider a strategic coordination mechanism for the decentralized transaction packaging process by using a gametheoretic approach. Under a tractable twoperiod model, we find a Bayesian Nash equilibrium of the miner's strategic transaction packaging under partial observability. Along with novel algorithms for computing equilibrium payoffs, we show that the decentralized platform can achieve an efficient and stable market outcome. The model also highlights that the proposed mechanism can endogenously offer a base fee per gas without any restructuration of the initial blockchain transaction fee mechanism. The theoretical results that underlie the algorithms also imply bounds on the computational complexity of equilibrium payoffs.  [7] arXiv:2301.11135 (crosslist from cs.LG) [pdf, other]

Title: FedHQL: Federated Heterogeneous QLearningAuthors: Flint Xiaofeng Fan, Yining Ma, Zhongxiang Dai, Cheston Tan, Bryan Kian Hsiang Low, Roger WattenhoferComments: Preprint. Under reviewSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
Federated Reinforcement Learning (FedRL) encourages distributed agents to learn collectively from each other's experience to improve their performance without exchanging their raw trajectories. The existing work on FedRL assumes that all participating agents are homogeneous, which requires all agents to share the same policy parameterization (e.g., network architectures and training configurations). However, in realworld applications, agents are often in disagreement about the architecture and the parameters, possibly also because of disparate computational budgets. Because homogeneity is not given in practice, we introduce the problem setting of Federated Reinforcement Learning with Heterogeneous And bLackbox agEnts (FedRLHALE). We present the unique challenges this new setting poses and propose the Federated Heterogeneous QLearning (FedHQL) algorithm that principally addresses these challenges. We empirically demonstrate the efficacy of FedHQL in boosting the sample efficiency of heterogeneous agents with distinct policy parameterization using standard RL tasks.
 [8] arXiv:2301.11205 (crosslist from cs.DS) [pdf, ps, other]

Title: Deterministic Massively Parallel Symmetry Breaking for Sparse GraphsSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
We consider the problem of designing deterministic graph algorithms for the model of Massively Parallel Computation (MPC) that improve with the sparsity of the input graph, as measured by the notion of arboricity. For the problems of maximal independent set (MIS), maximal matching (MM), and vertex coloring, we improve the state of the art as follows. Let $\lambda$ denote the arboricity of the $n$node input graph with maximum degree $\Delta$.
MIS and MM: We develop a deterministic fullyscalable algorithm that reduces the maximum degree to $poly(\lambda)$ in $O(\log \log n)$ rounds, improving and simplifying the randomized $O(\log \log n)$round $poly(\max(\lambda, \log n))$degree reduction of Ghaffari, Grunau, Jin [DISC'20]. Our approach when combined with the stateoftheart $O(\log \Delta + \log \log n)$round algorithm by Czumaj, Davies, Parter [SPAA'20, TALG'21] leads to an improved deterministic round complexity of $O(\log \lambda + \log \log n)$ for MIS and MM in lowspace MPC.
We also extend above MIS and MM algorithms to work with linear global memory. Specifically, we show that both problems can be solved in deterministic time $O(\min(\log n, \log \lambda \cdot \log \log n))$, and even in $O(\log \log n)$ time for graphs with arboricity at most $\log^{O(1)} \log n$. In this setting, only a $O(\log^2 \log n)$running time bound for trees was known due to Latypov and Uitto [ArXiv'21].
Vertex Coloring: We present a $O(1)$round deterministic algorithm for the problem of $O(\lambda)$coloring in linearmemory MPC with relaxed global memory of $n \cdot poly(\lambda)$ that solves the problem after just one single graph partitioning step. This matches the stateoftheart randomized round complexity by Ghaffari and Sayyadi [ICALP'19] and improves upon the deterministic $O(\lambda^{\epsilon})$round algorithm by Barenboim and Khazanov [CSR'18].
Replacements for Fri, 27 Jan 23
 [9] arXiv:2109.02340 (replaced) [pdf, other]

Title: Khaos: Dynamically Optimizing Checkpointing for Dependable Distributed Stream ProcessingSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [10] arXiv:2205.11652 (replaced) [pdf, other]

Title: BeeGees: stayin' alive in chained BFTSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2301, contact, help (Access key information)