We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 19 entries: 1-19 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 28 Jan 22

[1]  arXiv:2201.11124 [pdf, other]
Title: Using Hybrid Scheduling Algorithms for Solving Blockchain Allocation on Cloud
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Companies are rushing to deliver their services and solutions through the cloud. The scheduling process is very critical in reducing delays. Scheduling also has a role in accessing resources without excessive waiting time. All this in context of modern advances in infrastructure and the emergence of Blockchain-as-a-service. What if integration is done between a hybrid scheduling algorithm and blockchain technology via the cloud. This integration aims to enhance and provide the service uninterruptedly. This method is distinguished, compared to other scheduling algorithms such as shortest-job-first and priority scheduling, that it does not suffer from starvation and it has a balanced load on resources. Based on analytical performance, the proposed hybrid scheduling has the markable result.

[2]  arXiv:2201.11216 [pdf, other]
Title: Serverless Architecture for Bulk Email Management
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Sending emails in large quantities can be tedious considering free services do not cover bulk email and paid services can be costly and are not easy to customize. Traditional email client used for basic emailing services fail to be useful in larger volumes of emails to target people or spread information to consented individuals. This paper proposes a serverless architecture to tackle such problems by using one such offering from the Amazon Web Services(AWS) API which can be easily replaced by a software architects choice of service. The constraints help to make an architecture using components that can fit most of the needs of a serverless backend and extend it to scenarios such mobile notifications, One Time Password (OTP) systems or other means of communication to minimize single point of failure and also decrease the dependency on physical servers for such operations offering a comparable solution within the cloud. The architecture proposed is tested to find the time taken to send the emails of various quantities and see how it affects the cost. The architecture was successful able to send multiple emails in a quick and single invocation and has demonstrated a higher level of scalability compared to conventional methods.

[3]  arXiv:2201.11247 [pdf, other]
Title: Data-Quality Based Scheduling for Federated Edge Learning
Comments: 2021 IEEE 46th Conference on Local Computer Networks (LCN)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

FEderated Edge Learning (FEEL) has emerged as a leading technique for privacy-preserving distributed training in wireless edge networks, where edge devices collaboratively train machine learning (ML) models with the orchestration of a server. However, due to frequent communication, FEEL needs to be adapted to the limited communication bandwidth. Furthermore, the statistical heterogeneity of local datasets' distributions, and the uncertainty about the data quality pose important challenges to the training's convergence. Therefore, a meticulous selection of the participating devices and an analogous bandwidth allocation are necessary. In this paper, we propose a data-quality based scheduling (DQS) algorithm for FEEL. DQS prioritizes reliable devices with rich and diverse datasets. In this paper, we define the different components of the learning algorithm and the data-quality evaluation. Then, we formulate the device selection and the bandwidth allocation problem. Finally, we present our DQS algorithm for FEEL, and we evaluate it in different data poisoning scenarios.

[4]  arXiv:2201.11275 [pdf, other]
Title: Wireless IoT Energy Sharing Platform
Comments: 3 pages, 3 figures, PERCOM 2022 , Demo Paper
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Wireless energy sharing is a novel convenient alternative to charge IoT devices. In this demo paper, we present a peer-to-peer wireless energy sharing platform. The platform enables users to exchange energy wirelessly with nearby IoT devices. The energy sharing platform allows IoT users to send and receive energy wirelessly. The platform consists of (i) a mobile application that monitors and synchronizes the energy transfer among two IoT devices and (ii) and a backend to register energy providers and consumers and store their energy transfer transactions. The eveloped framework allows the collection of a real wireless energy sharing dataset. A set of preliminary experiments has been conducted on the collected dataset to analyze and demonstrate the behavior of the current wireless energy sharing technology.

[5]  arXiv:2201.11326 [pdf, other]
Title: High-order Line Graphs of Non-uniform Hypergraphs: Algorithms, Applications, and Experimental Analysis
Comments: Accepted at "36th IEEE International Parallel & Distributed Processing Symposium (IPDPS '22)"
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Hypergraphs offer flexible and robust data representations for many applications, but methods that work directly on hypergraphs are not readily available and tend to be prohibitively expensive. Much of the current analysis of hypergraphs relies on first performing a graph expansion -- either based on the nodes (clique expansion), or on the edges (line graph) -- and then running standard graph analytics on the resulting representative graph. However, this approach suffers from massive space complexity and high computational cost with increasing hypergraph size. Here, we present efficient, parallel algorithms to accelerate and reduce the memory footprint of higher-order graph expansions of hypergraphs. Our results focus on the edge-based $s$-line graph expansion, but the methods we develop work for higher-order clique expansions as well. To the best of our knowledge, ours is the first framework to enable hypergraph spectral analysis of a large dataset on a single shared-memory machine. Our methods enable the analysis of datasets from many domains that previous graph-expansion-based models are unable to provide. The proposed $s$-line graph computation algorithms are orders of magnitude faster than state-of-the-art sparse general matrix-matrix multiplication methods, and obtain approximately $5-31{\times}$ speedup over a prior state-of-the-art heuristic-based algorithm for $s$-line graph computation.

[6]  arXiv:2201.11454 [pdf, other]
Title: Estimating the Capacities of Function-as-a-Service Functions
Comments: 8 pages, Accepted at CloudAM'21 Workshop (UCC)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Other Statistics (stat.OT)

Serverless computing is a cloud computing paradigm that allows developers to focus exclusively on business logic as cloud service providers manage resource management tasks. Serverless applications follow this model, where the application is decomposed into a set of fine-grained Function-as-a-Service (FaaS) functions. However, the obscurities of the underlying system infrastructure and dependencies between FaaS functions within the application pose a challenge for estimating the performance of FaaS functions. To characterize the performance of a FaaS function that is relevant for the user, we define Function Capacity (FC) as the maximal number of concurrent invocations the function can serve in a time without violating the Service-Level Objective (SLO).
The paper addresses the challenge of quantifying the FC individually for each FaaS function within a serverless application. This challenge is addressed by sandboxing a FaaS function and building its performance model. To this end, we develop FnCapacitor - an end-to-end automated Function Capacity estimation tool. We demonstrate the functioning of our tool on Google Cloud Functions (GCF) and AWS Lambda. FnCapacitor estimates the FCs on different deployment configurations (allocated memory & maximum function instances) by conducting time-framed load tests and building various models using statistical: linear, ridge, and polynomial regression, and Deep Neural Network (DNN) methods on the acquired performance data. Our evaluation of different FaaS functions shows relatively accurate predictions, with an accuracy greater than 75% using DNN for both cloud providers.

[7]  arXiv:2201.11655 [pdf, other]
Title: BFS based distributed algorithm for parallel local directed sub-graph enumeration
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

Estimating the frequency of sub-graphs is of importance for many tasks, including sub-graph isomorphism, kernel-based anomaly detection, and network structure analysis. While multiple algorithms were proposed for full enumeration or sampling-based estimates, these methods fail in very large graphs. Recent advances in parallelization allow for estimates of total sub-graphs counts in very large graphs. The task of counting the frequency of each sub-graph associated with each vertex also received excellent solutions for undirected graphs. However, there is currently no good solution for very large directed graphs.
We here propose VDMC (Vertex specific Distributed Motif Counting) -- a fully distributed algorithm to optimally count all the 3 and 4 vertices connected directed graphs (sub-graph motifs) associated with each vertex of a graph. VDMC counts each motif only once and its efficacy is linear in the number of counted motifs. It is fully parallelized to be efficient in GPU-based computation. VDMC is based on three main elements: 1) Ordering the vertices and only counting motifs containing increasing order vertices, 2) sub-ordering motifs based on the average length of the BFS composing the motif, and 3) removing isomorphisms only once for the entire graph. We here compare VDMC to analytical estimates of the expected number of motifs and show its accuracy. VDMC is available as a highly efficient CPU and GPU code with a novel data structure for efficient graph manipulation. We show the efficacy of VDMC and real-world graphs. VDMC allows for the precise analysis of sub-graph frequency around each vertex in large graphs and opens the way for the extension of methods until now limited to graphs of thousands of edges to graphs with millions of edges and above.
GIT: https://github.com/louzounlab/graph-measures

[8]  arXiv:2201.11668 [pdf, other]
Title: Efficient Hierarchical Storage Management Framework Empowered by Reinforcement Learning
Comments: 20 pages, 13 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

With the rapid development of big data and cloud computing, data management has become increasingly challenging. Over the years, a number of frameworks for data management and storage with various characteristics and features have become available. Most of these are highly efficient, but ultimately create data silos. It becomes difficult to move and work coherently with data as new requirements emerge as no single framework can efficiently fulfill the data management needs of diverse applications. A possible solution is to design smart and efficient hierarchical (multi-tier) storage solutions. A hierarchical storage system (HSS) is a meta solution that consists of different storage frameworks organized as a jointly constructed large storage pool. It brings a number of benefits including better utilization of the storage, cost-efficiency, and use of different features provided by the underlying storage frameworks. In order to maximize the gains of hierarchical storage solutions, it is important that they include intelligent and autonomous mechanisms for data management grounded in the features of the different underlying frameworks. These decisions should be made according to the characteristics of the dataset, tier status, and access patterns. These are highly dynamic parameters and defining a policy based on the mentioned parameters is a non-trivial task. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL). We present a mathematical model, a software architecture, and an implementation based on both simulations and a live cloud-based environment. We compare the proposed RL-based strategy to a baseline of three rule-based policies, showing that the RL-based policy achieves significantly higher efficiency and optimal data distribution in different scenarios compared to the dynamic rule-based policies.

[9]  arXiv:2201.11727 [pdf, other]
Title: Reinforced Cooperative Load Balancing in Data Center
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)

Network load balancers are central components in modern data centers, that cooperatively distribute workloads of high arrival rates across application servers, thereby contribute to offering scalable services. The independent and "selfish" load balancing strategy is not necessarily the globally optimal one. This paper represents the load balancing problem as a cooperative team-game with limited observations over system states, and adopts multi-agent reinforcement learning methods to make fair load balancing decisions without inducing additional processing latency. On both a simulation and an emulation system, the proposed method is evaluated against other load balancing algorithms, including state-of-the-art heuristics and learning-based strategies. Experiments under different settings and complexities show the advantageous performance of the proposed method.

Cross-lists for Fri, 28 Jan 22

[10]  arXiv:2201.11133 (cross-list from gr-qc) [pdf, other]
Title: Inference-optimized AI and high performance computing for gravitational wave detection at scale
Comments: 19 pages, 8 figure
Subjects: General Relativity and Quantum Cosmology (gr-qc); Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

We introduce an ensemble of artificial intelligence models for gravitational wave detection that we trained in the Summit supercomputer using 32 nodes, equivalent to 192 NVIDIA V100 GPUs, within 2 hours. Once fully trained, we optimized these models for accelerated inference using NVIDIA TensorRT. We deployed our inference-optimized AI ensemble in the ThetaGPU supercomputer at Argonne Leadership Computer Facility to conduct distributed inference. Using the entire ThetaGPU supercomputer, consisting of 20 nodes each of which has 8 NVIDIA A100 Tensor Core GPUs and 2 AMD Rome CPUs, our NVIDIA TensorRT-optimized AI ensemble porcessed an entire month of advanced LIGO data (including Hanford and Livingston data streams) within 50 seconds. Our inference-optimized AI ensemble retains the same sensitivity of traditional AI models, namely, it identifies all known binary black hole mergers previously identified in this advanced LIGO dataset and reports no misclassifications, while also providing a 3X inference speedup compared to traditional artificial intelligence models. We used time slides to quantify the performance of our AI ensemble to process up to 5 years worth of advanced LIGO data. In this synthetically enhanced dataset, our AI ensemble reports an average of one misclassification for every month of searched advanced LIGO data. We also present the receiver operating characteristic curve of our AI ensemble using this 5 year long advanced LIGO dataset. This approach provides the required tools to conduct accelerated, AI-driven gravitational wave detection at scale.

[11]  arXiv:2201.11149 (cross-list from cs.IT) [pdf, other]
Title: DoF of a Cooperative X-Channel with an Application to Distributed Computing
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

We consider a cooperative X-channel with $\sf K$ transmitters (TXs) and $\sf K$ receivers (Rxs) where Txs and Rxs are gathered into groups of size $\sf r$ respectively. Txs belonging to the same group cooperate to jointly transmit a message to each of the $\sf K- \sf r$ Rxs in all other groups, and each Rx individually decodes all its intended messages. By introducing a new interference alignment (IA) scheme, we prove that when $\sf K/\sf r$ is an integer the sum Degrees of Freedom (SDoF) of this channel is lower bounded by $2\sf r$ if $\sf K/\sf r \in \{2,3\}$ and by $\frac{\sf K(\sf K-\sf r)-\sf r}{2\sf K-3\sf r}$ if $\sf K/\sf r \geq 4$. We also prove that the SDoF is upper bounded by $\frac{\sf K(\sf K-\sf r)}{2\sf K-3\sf r}$. The proposed IA scheme finds application in a wireless distributed MapReduce framework, where it improves the normalized data delivery time (NDT) compared to the state of the art.

[12]  arXiv:2201.11380 (cross-list from cs.LG) [pdf, other]
Title: Achieving Personalized Federated Learning with Sparse Local Models
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Federated learning (FL) is vulnerable to heterogeneously distributed data, since a common global model in FL may not adapt to the heterogeneous data distribution of each user. To counter this issue, personalized FL (PFL) was proposed to produce dedicated local models for each individual user. However, PFL is far from its maturity, because existing PFL solutions either demonstrate unsatisfactory generalization towards different model architectures or cost enormous extra computation and memory. In this work, we propose federated learning with personalized sparse mask (FedSpa), a novel PFL scheme that employs personalized sparse masks to customize sparse local models on the edge. Instead of training an intact (or dense) PFL model, FedSpa only maintains a fixed number of active parameters throughout training (aka sparse-to-sparse training), which enables users' models to achieve personalization with cheap communication, computation, and memory cost. We theoretically show that the iterates obtained by FedSpa converge to the local minimizer of the formulated SPFL problem at rate of $\mathcal{O}(\frac{1}{\sqrt{T}})$. Comprehensive experiments demonstrate that FedSpa significantly saves communication and computation costs, while simultaneously achieves higher model accuracy and faster convergence speed against several state-of-the-art PFL methods.

[13]  arXiv:2201.11526 (cross-list from astro-ph.IM) [pdf, ps, other]
Title: A distributed computing infrastructure for LOFAR Italian community
Comments: In Astronomical Data Analysis Software and Systems (ADASS) XXXI
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Distributed, Parallel, and Cluster Computing (cs.DC)

The LOw-Frequency ARray is a low-frequency radio interferometer composed by observational stations spread across Europe and it is the largest precursor of SKA in terms of effective area and generated data rates. In 2018, the Italian community officially joined LOFAR project, and it deployed a distributed computing and storage infrastructure dedicated to LOFAR data analysis. The infrastructure is based on 4 nodes distributed in different Italian locations and it offers services for pipelines execution, storage of final and intermediate results and support for the use of the software and infrastructure. As the analysis of the LOw-Frequency ARray data requires a very complex computational procedure, a container-based approach has been adopted to distribute software environments to the different computing resources. A science platform approach is used to facilitate interactive access to computational resources. In this paper, we describe the architecture and main features of the infrastructure.

[14]  arXiv:2201.11603 (cross-list from cs.CR) [pdf, other]
Title: Plume: Differential Privacy at Scale
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

Differential privacy has become the standard for private data analysis, and an extensive literature now offers differentially private solutions to a wide variety of problems. However, translating these solutions into practical systems often requires confronting details that the literature ignores or abstracts away: users may contribute multiple records, the domain of possible records may be unknown, and the eventual system must scale to large volumes of data. Failure to carefully account for all three issues can severely impair a system's quality and usability.
We present Plume, a system built to address these problems. We describe a number of sometimes subtle implementation issues and offer practical solutions that, together, make an industrial-scale system for differentially private data analysis possible. Plume is currently deployed at Google and is routinely used to process datasets with trillions of records.

[15]  arXiv:2201.11638 (cross-list from cs.AR) [pdf, other]
Title: Reuse-Aware Cache Partitioning Framework for Data-Sharing Multicore Systems
Comments: 2 pages. 7th IEEE International Symposium on Smart Electronic Systems (iSES) 2021
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)

Multi-core processors improve performance, but they can create unpredictability owing to shared resources such as caches interfering. Cache partitioning is used to alleviate the Worst-Case Execution Time (WCET) estimation by isolating the shared cache across each thread to reduce interference. It does, however, prohibit data from being transferred between parallel threads running on different cores. In this paper we present (SRCP) a cache replacement mechanism for partitioned caches that is aware of data being shared across threads, prevents shared data from being replicated across partitions and frequently used data from being evicted from caches. Our technique outperforms TA-DRRIP and EHC, which are existing state-of-the-art cache replacement algorithms, by 13.34% in cache hit-rate and 10.4% in performance over LRU (least recently used) cache replacement policy.

Replacements for Fri, 28 Jan 22

[16]  arXiv:2009.00081 (replaced) [pdf, other]
Title: Federated Edge Learning : Design Issues and Challenges
Comments: Submitted to IEEE Network Magazine
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
[17]  arXiv:2102.09491 (replaced) [pdf, ps, other]
Title: Data-Aware Device Scheduling for Federated Edge Learning
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[18]  arXiv:2110.15032 (replaced) [pdf, ps, other]
Title: OneFlow: Redesign the Distributed Deep Learning Framework from Scratch
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[19]  arXiv:2201.05072 (replaced) [pdf, other]
Title: SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems
Comments: To appear in the Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS) 2022 and the ACM SIGMETRICS 2022 conference
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[ total of 19 entries: 1-19 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2201, contact, help  (Access key information)