We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 8 entries: 1-8 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 25 May 20

[1]  arXiv:2005.11050 [pdf, other]
Title: Autonomous Task Dropping Mechanism to Achieve Robustness in Heterogeneous Computing Systems
Journal-ref: in 29th Heterogeneity in Computing Workshop (HCW 2019), in the Proceedings of the IPDPS 2019 Workshops & PhD Forum (IPDPSW)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Operating Systems (cs.OS); Performance (cs.PF)

Robustness of a distributed computing system is defined as the ability to maintain its performance in the presence of uncertain parameters. Uncertainty is a key problem in heterogeneous (and even homogeneous) distributed computing systems that perturbs system robustness. Notably, the performance of these systems is perturbed by uncertainty in both task execution time and arrival. Accordingly, our goal is to make the system robust against these uncertainties. Considering task execution time as a random variable, we use probabilistic analysis to develop an autonomous proactive task dropping mechanism to attain our robustness goal. Specifically, we provide a mathematical model that identifies the optimality of a task dropping decision, so that the system robustness is maximized. Then, we leverage the mathematical model to develop a task dropping heuristic that achieves the system robustness within a feasible time complexity. Although the proposed model is generic and can be applied to any distributed system, we concentrate on heterogeneous computing (HC) systems that have a higher degree of exposure to uncertainty than homogeneous systems. Experimental results demonstrate that the autonomous proactive dropping mechanism can improve the system robustness by up to 20%.

[2]  arXiv:2005.11054 [pdf]
Title: Reasonableness discussion and analysis for Hyperledger Fabric configuration
Comments: 8 pages, 5 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Blockchain, as a distributed ledger technology, becomes more and more popular in both industry and academia. Each peer in blockchain system maintains a copy of ledger and makes sure of data consistency through consensus protocol. Blockchain system can provide many benefits such as immutability, transparency and security. Hyperledger Fabric is permissioned blockchain platform hosted by Linux foundation. Fabric has various components such as peer, ordering service, chaincode and state database. The structure of Fabric network is very complicated to provide reliable permissioned blockchain service. Generally, developers must deal with hundreds of parameters to configure a network. That will cause many reasonableness problems in configurations. In this paper, we focus on how to detect reasonableness problems in Fabric configurations. Firstly, we discuss and provide a reasonableness problem knowledge database based on the perspectives of functionality, security and performance. Secondly, we implemented a detect tool for reasonableness check to Fabric. Finally, we collect 108 sample networks as the testing dataset in the experiment. The result shows our tool can help developers to locate reasonableness problems and understand their network better.

[3]  arXiv:2005.11158 [pdf, other]
Title: Hermes: Enabling Energy-efficient IoT Networks with Generalized Deduplication
Comments: This work was partially financed by the SCALE-IoT Project (Grant No. 7026-00042B) granted by the Independent Research Fund Denmark, by the Aarhus Universitets Forskningsfond (AUFF) Starting Grant Project AUFF- 2017-FLS-7-1, and Aarhus University's DIGIT Centre. European Commission Project: LEGaTO - Low Energy Toolset for Heterogeneous Computing (EC-H2020-780681)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

With the advent of the Internet of Things (IoT), the ever growing number of connected devices observed in recent years and foreseen for the next decade suggests that more and more data will have to be transmitted over a network, before being processed and stored in data centers. Generalized deduplication (GD) is a novel technique to effectively reduce the data storage cost by identifying similar data chunks, and able to gradually reduce the pressure from the network infrastructure by limiting the data that needs to be transmitted.
This paper presents Hermes, an application-level protocol for the data-plane that can operate over generalized deduplication, as well as over classic deduplication. Hermes significantly reduces the data transmission traffic while effectively decreasing the energy footprint, a relevant matter to consider in the context of IoT deployments. We fully implemented Hermes and evaluated its performance using consumer-grade IoT devices (e.g., Raspberry Pi 4B models). Our results highlight several trade-offs that must be taken into account when considering real-world workloads.

Cross-lists for Mon, 25 May 20

[4]  arXiv:2005.10855 (cross-list from cs.NI) [pdf, other]
Title: Modeling and Optimization of Latency in Erasure-coded Storage Systems
Comments: Monograph for use by researchers interested in latency aspects of distributed storage systems
Subjects: Networking and Internet Architecture (cs.NI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage.
In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We will also extend the problem to the case when users are streaming videos from erasure-coded distributed storage systems. Next, we bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.

[5]  arXiv:2005.11026 (cross-list from cs.DS) [pdf, other]
Title: Target Location Problem for Multi-commodity Flow
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Motivated by scheduling in Geo-distributed data analysis, we propose a target location problem for multi-commodity flow (LoMuF for short). Given commodities to be sent from their resources, LoMuF aims at locating their targets so that the multi-commodity flow is optimized in some sense. LoMuF is a combination of two fundamental problems, namely, the facility location problem and the network flow problem. We study the hardness and algorithmic issues of the problem in various settings. The findings lie in three aspects. First, a series of NP-hardness and APX-hardness results are obtained, uncovering the inherent difficulty in solving this problem. Second, we propose an approximation algorithm for general undirected networks and an exact algorithm for undirected trees, which naturally induce efficient approximation algorithms on directed networks. Third, we observe separations between directed networks and undirected ones, indicating that imposing direction on edges makes the problem strictly harder. These results show the richness of the problem and pave the way to further studies.

[6]  arXiv:2005.11259 (cross-list from cs.DB) [pdf, other]
Title: CAPre: Code-Analysis based Prefetching for Persistent Object Stores
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)

Data prefetching aims to improve access times to data storage systems by predicting data records that are likely to be accessed by subsequent requests and retrieving them into a memory cache before they are needed. In the case of Persistent Object Stores, previous approaches to prefetching have been based on predictions made through analysis of the store's schema, which generates rigid predictions, or monitoring the access patterns to the store while applications are executed, which introduces memory and/or computation overhead. In this paper, we present CAPre, a novel prefetching system for Persistent Object Stores based on static code analysis for object-oriented applications. CAPre generates the predictions at compile-time and does not introduce any overhead to the application execution. Moreover, given that CAPre is able to predict large amounts of objects that will be accessed in the near future, it enables the object store to perform parallel prefetching if the objects are distributed, in a much more aggressive way than in schema-based prediction algorithms. We integrate CAPre into a distributed Persistent Object Store and run a series of experiments that show that it can reduce the execution time of applications from 9% to over 50%, depending on the application.

Replacements for Mon, 25 May 20

[7]  arXiv:2003.07491 (replaced) [pdf, ps, other]
Title: The Power of Global Knowledge on Self-stabilizing Population Protocols
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[8]  arXiv:1901.01007 (replaced) [pdf, other]
Title: FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters
Subjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
[ total of 8 entries: 1-8 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2005, contact, help  (Access key information)