We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 20 entries: 1-20 ]
[ showing up to 100 entries per page: fewer | more ]

New submissions for Wed, 15 May 24

[1]  arXiv:2405.08135 [pdf, other]
Title: An Optimal Multilevel Quorum System for Probabilistic Consensus
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Discrete Mathematics (cs.DM); Probability (math.PR)

We present the notion of a multilevel, slashable quorum system, where an application can obtain gradual levels of assurance that a certain value is bound to be decided (or "finalized") in a global consensus procedure, unless a large number of Byzantine processes are exposed to slashing (that is, penalty on staked assets). Our construction is a highly parameterized generalization of quorum systems based on finite projective spaces, with asymptotic high availability and optimal slashing properties. In particular, we show that any quorum system whose ground elements are disjoint subsets of nodes (e.g. "commmittees" in committee-based consensus protocols) has asymptotic high availability under very reasonable conditions, a general proof with significance of its own. Under similarly relaxed conditions, we show that our construction has asymptotically optimal slashing properties with respect to message complexity and process load; this illustrates a fundamental trade off between message complexity, load, and slashing. Our multilevel construction allows nodes to decide how many "levels" of finalization assurance they wish to obtain, noting that this functionality, if applied to a proof-of-stake blockchain, can be seen either as (i) a form of an early, slashing-based, probabilistic block finalization; or (ii) a service for reorg tolerance.

[2]  arXiv:2405.08187 [pdf, other]
Title: Optimizing Task Scheduling in Heterogeneous Computing Environments: A Comparative Analysis of CPU, GPU, and ASIC Platforms Using E2C Simulator
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS)

Efficient task scheduling in heterogeneous computing environments is imperative for optimizing resource utilization and minimizing task completion times. In this study, we conducted a comprehensive benchmarking analysis to evaluate the performance of four scheduling algorithms First Come, First-Served (FCFS), FCFS with No Queuing (FCFS-NQ), Minimum Expected Completion Time (MECT), and Minimum Expected Execution Time (MEET) across varying workload scenarios. We defined three workload scenarios: low, medium, and high, each representing different levels of computational demands. Through rigorous experimentation and analysis, we assessed the effectiveness of each algorithm in terms of total completion percentage, energy consumption, wasted energy, and energy per completion. Our findings highlight the strengths and limitations of each algorithm, with MECT and MEET emerging as robust contenders, dynamically prioritizing tasks based on comprehensive estimates of completion and execution times. Furthermore, MECT and MEET exhibit superior energy efficiency compared to FCFS and FCFS-NQ, underscoring their suitability for resource-constrained environments. This study provides valuable insights into the efficacy of task scheduling algorithms in heterogeneous computing environments, enabling informed decision-making to enhance resource allocation, minimize task completion times, and improve energy efficiency

[3]  arXiv:2405.08411 [pdf, other]
Title: Large-Scale Metric Computation in Online Controlled Experiment Platform
Authors: Tao Xiong, Yong Wang
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Online controlled experiment (also called A/B test or experiment) is the most important tool for decision-making at a wide range of data-driven companies like Microsoft, Google, Meta, etc. Metric computation is the core procedure for reaching a conclusion during an experiment. With the growth of experiments and metrics in an experiment platform, computing metrics efficiently at scale becomes a non-trivial challenge. This work shows how metric computation in WeChat experiment platform can be done efficiently using bit-sliced index (BSI) arithmetic. This approach has been implemented in a real world system and the performance results are presented, showing that the BSI arithmetic approach is very suitable for large-scale metric computation scenarios.

[4]  arXiv:2405.08470 [pdf, other]
Title: Sparse MTTKRP Acceleration for Tensor Decomposition on GPU
Comments: In 21st ACM International Conference on Computing Frontiers (CF '24), May 7-9, 2024, Ischia, Italy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)

Sparse Matricized Tensor Times Khatri-Rao Product (spMTTKRP) is the bottleneck kernel of sparse tensor decomposition. In this work, we propose a GPU-based algorithm design to address the key challenges in accelerating spMTTKRP computation, including (1) eliminating global atomic operations across GPU thread blocks, (2) avoiding the intermediate values being communicated between GPU thread blocks and GPU global memory, and (3) ensuring a balanced distribution of workloads across GPU thread blocks. Our approach also supports dynamic tensor remapping, enabling the above optimizations in all the modes of the input tensor. Our approach achieves a geometric mean speedup of 1.5x, 2.0x, and 21.7x in total execution time across widely used datasets compared with the state-of-the-art GPU implementations. Our work is the only GPU implementation that can support tensors with modes greater than 4 since the state-of-the-art works have implementation constraints for tensors with a large number of modes.

[5]  arXiv:2405.08637 [pdf, other]
Title: Drift Detection: Introducing Gaussian Split Detector
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Recent research yielded a wide array of drift detectors. However, in order to achieve remarkable performance, the true class labels must be available during the drift detection phase. This paper targets at detecting drift when the ground truth is unknown during the detection phase. To that end, we introduce Gaussian Split Detector (GSD) a novel drift detector that works in batch mode. GSD is designed to work when the data follow a normal distribution and makes use of Gaussian mixture models to monitor changes in the decision boundary. The algorithm is designed to handle multi-dimension data streams and to work without the ground truth labels during the inference phase making it pertinent for real world use. In an extensive experimental study on real and synthetic datasets, we evaluate our detector against the state of the art. We show that our detector outperforms the state of the art in detecting real drift and in ignoring virtual drift which is key to avoid false alarms.

[6]  arXiv:2405.08651 [pdf, other]
Title: BeACONS: A Blockchain-enabled Authentication and Communications Network for Scalable IoV
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled Mutual Authentication. The proposed network is structured into a primary layer, consisting of Road Side Units and edge servers as servers of Blockchain-enabled Domain Name Services for managing inter-vehicle communications identities, and a sub-layer within each vehicle for intra-vehicle communications via the Blockchain-enabled Mutual Authentication Protocol. This design facilitates secure connections across vehicles by coordinating between the layers, significantly improving communications security and efficiency. This study also evaluates Road Side Unit availability against the random distribution of Road Side Units along the route of different vehicles. The proposed model presents a novel pathway towards a decentralised, secure, and efficient Internet of Vehicles ecosystem, contributing to the advancement of autonomous and trustworthy vehicular networks.

[7]  arXiv:2405.08663 [pdf, other]
Title: D-CAST: Distributed Consensus Switch in Wireless Trustworthy Autonomous System
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The protocols of distributed consensus normally aim to tolerate different types of faults including crash faults and byzantine faults that occur in the distributed systems. However, the dynamic network topology and stochastic wireless channels may cause the same trustworthy system to suffer both crash fault and byzantine fault. This article proposes the concept of a distributed consensus autonomous switch mechanism in trustworthy autonomous systems (D-CAST) to reach the different fault tolerance requirements of the dynamic nodes and discusses the challenges of D-CAST while it is implemented in the wireless trustworthy system.

[8]  arXiv:2405.08754 [pdf, ps, other]
Title: Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach
Comments: Published in: 2023 IEEE International Conference on Cluster Computing (CLUSTER)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR); Machine Learning (cs.LG)

GPU-based heterogeneous architectures are now commonly used in HPC clusters. Due to their architectural simplicity specialized for data-level parallelism, GPUs can offer much higher computational throughput and memory bandwidth than CPUs in the same generation do. However, as the available resources in GPUs have increased exponentially over the past decades, it has become increasingly difficult for a single program to fully utilize them. As a consequence, the industry has started supporting several resource partitioning features in order to improve the resource utilization by co-scheduling multiple programs on the same GPU die at the same time. Driven by the technological trend, this paper focuses on hierarchical resource partitioning on modern GPUs, and as an example, we utilize a combination of two different features available on recent NVIDIA GPUs in a hierarchical manner: MPS (Multi-Process Service), a finer-grained logical partitioning; and MIG (Multi-Instance GPU), a coarse-grained physical partitioning. We propose a method for comprehensively co-optimizing the setup of hierarchical partitioning and the selection of co-scheduling groups from a given set of jobs, based on reinforcement learning using their profiles. Our thorough experimental results demonstrate that our approach can successfully set up job concurrency, partitioning, and co-scheduling group selections simultaneously. This results in a maximum throughput improvement by a factor of 1.87 compared to the time-sharing scheduling.

Cross-lists for Wed, 15 May 24

[9]  arXiv:2405.08268 (cross-list from cs.CR) [pdf, other]
Title: T-Watch: Towards Timed Execution of Private Transaction in Blockchains
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

In blockchains such as Bitcoin and Ethereum, transactions represent the primary mechanism that the external world can use to trigger a change of blockchain state. Transactions serve as key sources of evidence and play a vital role in forensic analysis. Timed transaction refers to a specific class of service that enables a user to schedule a transaction to change the blockchain state during a chosen future time-frame. This paper proposes T-Watch, a decentralized and cost-efficient approach for users to schedule timed execution of any type of transaction in Ethereum with privacy guarantees. T-Watch employs a novel combination of threshold secret sharing and decentralized smart contracts. To protect the private elements of a scheduled transaction from getting disclosed before the future time-frame, T-Watch maintains shares of the decryption key of the scheduled transaction using a group of executors recruited in a blockchain network before the specified future time-frame and restores the scheduled transaction at a proxy smart contract to trigger the change of blockchain state at the required time-frame. To reduce the cost of smart contract execution in T-Watch, we carefully design the proposed protocol to run in an optimistic mode by default and then switch to a pessimistic mode once misbehaviors occur. Furthermore, the protocol supports users to form service request pooling to further reduce the gas cost. We rigorously analyze the security of T-Watch and implement the protocol over the Ethereum official test network. The results demonstrate that T-Watch is more scalable compared to the state of the art and could reduce the cost by over 90% through pooling.

[10]  arXiv:2405.08297 (cross-list from cs.LG) [pdf, ps, other]
Title: Distance-Restricted Explanations: Theoretical Underpinnings & Efficient Implementation
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)

The uses of machine learning (ML) have snowballed in recent years. In many cases, ML models are highly complex, and their operation is beyond the understanding of human decision-makers. Nevertheless, some uses of ML models involve high-stakes and safety-critical applications. Explainable artificial intelligence (XAI) aims to help human decision-makers in understanding the operation of such complex ML models, thus eliciting trust in their operation. Unfortunately, the majority of past XAI work is based on informal approaches, that offer no guarantees of rigor. Unsurprisingly, there exists comprehensive experimental and theoretical evidence confirming that informal methods of XAI can provide human-decision makers with erroneous information. Logic-based XAI represents a rigorous approach to explainability; it is model-based and offers the strongest guarantees of rigor of computed explanations. However, a well-known drawback of logic-based XAI is the complexity of logic reasoning, especially for highly complex ML models. Recent work proposed distance-restricted explanations, i.e. explanations that are rigorous provided the distance to a given input is small enough. Distance-restricted explainability is tightly related with adversarial robustness, and it has been shown to scale for moderately complex ML models, but the number of inputs still represents a key limiting factor. This paper investigates novel algorithms for scaling up the performance of logic-based explainers when computing and enumerating ML model explanations with a large number of inputs.

[11]  arXiv:2405.08395 (cross-list from cs.CR) [pdf, other]
Title: Cross-Blockchain Communication Using Oracles With an Off-Chain Aggregation Mechanism Based on zk-SNARKs
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

The closed architecture of prevailing blockchain systems renders the usage of this technology mostly infeasible for a wide range of real-world problems. Most blockchains trap users and applications in their isolated space without the possibility of cooperating or switching to other blockchains. Therefore, blockchains need additional mechanisms for seamless communication and arbitrary data exchange between each other and external systems. Unfortunately, current approaches for cross-blockchain communication are resource-intensive or require additional blockchains or tailored solutions depending on the applied consensus mechanisms of the connected blockchains. Therefore, we propose an oracle with an off-chain aggregation mechanism based on ZeroKnowledge Succinct Non-interactive Arguments of Knowledge (zk-SNARKs) to facilitate cross-blockchain communication. The oracle queries data from another blockchain and applies a rollup-like mechanism to move state and computation off-chain. The zkOracle contract only expects the transferred data, an updated state root, and proof of the correct execution of the aggregation mechanism. The proposed solution only requires constant 378 kgas to submit data on the Ethereum blockchain and is primarily independent of the underlying technology of the queried blockchains.

[12]  arXiv:2405.08698 (cross-list from cs.IT) [pdf, other]
Title: Byzantine-Resilient Secure Aggregation for Federated Learning Without Privacy Compromises
Subjects: Information Theory (cs.IT); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Federated learning (FL) shows great promise in large scale machine learning, but brings new risks in terms of privacy and security. We propose ByITFL, a novel scheme for FL that provides resilience against Byzantine users while keeping the users' data private from the federator and private from other users. The scheme builds on the preexisting non-private FLTrust scheme, which tolerates malicious users through trust scores (TS) that attenuate or amplify the users' gradients. The trust scores are based on the ReLU function, which we approximate by a polynomial. The distributed and privacy-preserving computation in ByITFL is designed using a combination of Lagrange coded computing, verifiable secret sharing and re-randomization steps. ByITFL is the first Byzantine resilient scheme for FL with full information-theoretic privacy.

Replacements for Wed, 15 May 24

[13]  arXiv:2306.11177 (replaced) [pdf, other]
Title: Pipit: Scripting the analysis of parallel execution traces
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[14]  arXiv:2306.17281 (replaced) [pdf, other]
Title: HPC-Coder: Modeling Parallel Programs using Large Language Models
Journal-ref: ISC High Performance 2024 Research Paper Proceedings (39th International Conference), Hamburg, Germany, 2024, pp. 1-12
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
[15]  arXiv:2401.12554 (replaced) [pdf, other]
Title: Can Large Language Models Write Parallel Code?
Journal-ref: The 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC '24), June 3-7, 2024, Pisa, Italy. ACM, New York, NY, USA, 14 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
[16]  arXiv:2405.05329 (replaced) [pdf, other]
Title: KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Comments: preprint for ICML 2024
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[17]  arXiv:2305.13525 (replaced) [pdf, other]
Title: A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[18]  arXiv:2403.06898 (replaced) [pdf, other]
Title: SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions
Comments: DaMoN 2024
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
[19]  arXiv:2403.17400 (replaced) [pdf, other]
Title: A Survey on Resource Management in Joint Communication and Computing-Embedded SAGIN
Comments: 43 pages, 17 figures
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC)
[20]  arXiv:2404.15735 (replaced) [pdf, other]
Title: On Replacing Cryptopuzzles with Useful Computation in Blockchain Proof-of-Work Protocols
Comments: Submitted to ACM Computing Surveys
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
[ total of 20 entries: 1-20 ]
[ showing up to 100 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2405, contact, help  (Access key information)