Distributed, Parallel, and Cluster Computing
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Wed, 22 Jan 20
 [1] arXiv:2001.06778 [pdf, other]

Title: CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via ShardingComments: short version in IPDPS 2020Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Traditional public distributed ledgers have not been able to scaleout well and work efficiently. Sharding is deemed as a promising way to solve this problem. By partitioning all nodes into small committees and letting them work in parallel, we can significantly lower the amount of communication and computation, reduce the overhead on each node's storage, as well as enhance the throughput of distributed ledger. Existing shardingbased protocols still suffer from several serious drawbacks. The first thing is that all honest nodes must connect well with each other, which demands a huge number of communication channels in the network. Moreover, previous protocols have face great loss in efficiency in the case where the honesty of each committee's leader is in question. At the same time, no explicit incentive is provided for nodes to actively participate in the protocol.
We present CycLedger, a scalable and secure parallel protocol for distributed ledger via sharding. Our protocol selects a leader and a partial set for each committee, who are in charge of maintaining intrashard consensus and communicating with other committees, to reduce the amortized complexity of communication, computation and storage on all nodes. We introduce a novel commitment scheme between committees and a recovery procedure to prevent the system from crashing even when leaders of committees are malicious. To add incentive for the network, we use the concept of reputation, which measures each node's computing power. As nodes with higher reputation receive more rewards, there is an encouragement for nodes with strong computing ability to work honestly so as to gain reputation. In this way, we strike out a new path to establish scalability, security and incentive for the shardingbased distributed ledger.  [2] arXiv:2001.06935 [pdf, other]

Title: 75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS MatricesAuthors: Jeremy Kepner, Tim Davis, Chansup Byun, William Arcand, David Bestor, William Bergeron, Vijay Gadepally, Matthew Hubbell, Michael Houle, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen, Andrew Prout, Antonio Rosa, Siddharth Samsi, Charles Yee, Albert ReutherComments: 3 pages, 2 figures, 28 references, accepted to Northeast Database Day (NEDB) 2020. arXiv admin note: substantial text overlap with arXiv:1907.04217Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Data Structures and Algorithms (cs.DS); Performance (cs.PF); Social and Information Networks (cs.SI)
The SuiteSparse GraphBLAS Clibrary implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight inmemory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.
 [3] arXiv:2001.06989 [pdf, other]

Title: The Parallelism Motifs of Genomic Data AnalysisAuthors: Katherine Yelick, Aydin Buluc, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid OlikerSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing.
 [4] arXiv:2001.07000 [pdf, ps, other]

Title: Contractconnection:An efficient communication protocol for Distributed Ledger TechnologyJournalref: 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
Distributed Ledger Technology (DLT) is promising to become the foundation of many decentralised systems. However, the unbalanced and unregulated network layout contributes to the inefficiency of DLT especially in the Internet of Things (IoT) environments, where nodes connect to only a limited number of peers. The data communication speed globally is unbalanced and does not live up to the constraints of efficient realtime distributed systems. In this paper, we introduce a new communication protocol, which enables nodes to calculate the tradeoff between connecting/disconnecting a peer in a completely decentralised manner. The network layout globally is continuously rebalancing and optimising along with nodes adjusting their peers. This communication protocol weakened the inequality of the communication network. The experiment suggests this communication protocol is stable and efficient.
 [5] arXiv:2001.07022 [pdf, other]

Title: BAASH: Enabling BlockchainasaService on HighPerformance Computing SystemsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The stateoftheart approach to manage blockchains is to process blocks of transactions in a sharednothing environment. Although blockchains have the potential to provide various services for highperformance computing (HPC) systems, HPC will not be able to embrace blockchains before the following two missing pieces become available: (i) new consensus protocols being aware of the sharedstorage architecture in HPC, and (ii) new faulttolerant mechanisms compensating for HPC's programming modelthe message passing interface (MPI)that is vulnerable for blockchainlike workloads. To this end, we design a new set of consensus protocols crafted for the HPC platforms and a new faulttolerance subsystem compensating for the failures caused by faulty MPI processes. Built on top of the new protocols and faulttolerance mechanism, a prototype system is implemented and evaluated with two million transactions on a 500core HPC cluster, showing $6\times$, $12\times$, and $75\times$ higher throughput than Hyperldeger, Ethereum, and Parity, respectively.
 [6] arXiv:2001.07077 [pdf, other]

Title: Distributed Vehicular Computing at the Dawn of 5G: a SurveyComments: 34 pages, 10 figures, JournalSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Recent advances in information technology have revolutionized the automotive industry, paving the way for nextgeneration smart and connected vehicles. Connected vehicles can collaborate to deliver novel services and applications. These services and applications require 1) massive volumes of data that perceive ambient environments, 2) ultrareliable and lowlatency communication networks, 3) realtime data processing which provides decision support under applicationspecific constraints. Addressing such constraints introduces significant challenges with current communication and computation technologies. Coincidentally, the fifth generation of cellular networks (5G) was developed to respond to communication challenges by providing an infrastructure for lowlatency, highreliability, and high bandwidth communication. At the core of this infrastructure, edge computing allows data offloading and computation at the edge of the network, ensuring lowlatency and contextawareness, and pushing the utilization efficiency of 5G to its limit. In this paper, we aim at providing a comprehensive overview of the state of research on vehicular computing in the emerging age of 5G. After reviewing the main vehicular applications requirements and challenges, we follow a bottomup approach, starting with the promising technologies for vehicular communications, all the way up to Artificial Intelligence (AI) solutions. We explore the various architectures for vehicular computing, including centralized Cloud Computing, Vehicular Cloud Computing, and Vehicular Edge computing, and investigate the potential data analytics technologies and their integration on top of the vehicular computing architectures. We finally discuss several future research directions and applications for vehicular computation systems.
 [7] arXiv:2001.07086 [pdf, other]

Title: 2PS: HighQuality Edge Partitioning with TwoPhase StreamingComments: in submissionSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equallysized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel twophase streaming algorithm for highquality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is restreamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavyweight random access partitioners while inducing orders of magnitude lower memory overhead.
 [8] arXiv:2001.07091 [pdf, other]

Title: Blockchain Consensuses Algorithms: A SurveySubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In recent years, blockchain technology has received unparalleled attention from academia, industry, and governments all around the world. It is considered a technological breakthrough anticipated to disrupt several application domains. This has resulted in a plethora of blockchain systems for various purposes. However, many of these blockchain systems suffer from serious shortcomings related to their performance and security, which need to be addressed before any widescale adoption can be achieved. A crucial component of any blockchain system is its underlying consensus algorithm, which in many ways, determines its performance and security. Therefore, to address the limitations of different blockchain systems, several existing as well novel consensus algorithms have been introduced. A systematic analysis of these algorithms will help to understand how and why any particular blockchain performs the way it functions. However, the existing studies of consensus algorithms are not comprehensive. Those studies have incomplete discussions on the properties of the algorithms and fail to analyse several major blockchain consensus algorithms in terms of their scopes. This article fills this gap by analysing a wide range of consensus algorithms using a comprehensive taxonomy of properties and by examining the implications of different issues still prevalent in consensus algorithms in detail. The result of the analysis is presented in tabular formats, which provides a visual illustration of these algorithms in a meaningful way. We have also analysed more than hundred top cryptocurrencies belonging to different categories of consensus algorithms to understand their properties and to implicate different trends in these cryptocurrencies. Finally, we have presented a decision tree of algorithms to be used as a tool to test the suitability of consensus algorithms under different criteria.
 [9] arXiv:2001.07103 [pdf, ps, other]

Title: OpenMP Parallelization of Dynamic Programming and Greedy AlgorithmsAuthors: Claude TadonkiSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore implementation for a important algorithmic kernel is a genuine contribution. From a methodology standpoint, this should be done at the level of the underlying paradigm if any. In this paper, we study the cases of {\em dynamic programming} and {\em greedy algorithms}, which are two major algorithmic paradigms. We exclusively consider directivesbased loop parallelization through OpenMP and investigate necessary pretransformations to reach a regular parallel form. We evaluate our methodology with a selection of wellknown combinatorial optimization problems on an INTEL Broadwell processor. Key points for scalability are discussed before and after experimental results. Our immediate perspective is to extend our study to the manycore case, with a special focus on NUMA configurations.
 [10] arXiv:2001.07104 [pdf, other]

Title: A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU KernelsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)
Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non trivial task. We address this with a simple model enabling portable and fast predictions among different GPUs using only hardwareindependent features extracted. This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, PolybenchGPU and SHOC. Evaluation of the model performance using crossvalidation yields a median Mean Average Percentage Error (MAPE) of [13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on five different GPUs, while latency for a single prediction varies between 0.1 and 0.2 seconds.
 [11] arXiv:2001.07276 [pdf]

Title: Transparently Capturing Request Execution Path for Anomaly DetectionComments: 13pages, 7 figuresSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
With the increasing scale and complexity of cloud systems and big data analytics platforms, it is becoming more and more challenging to understand and diagnose the processing of a service request in such distributed platforms. One way that helps to deal with this problem is to capture the complete endtoend execution path of service requests among all involved components accurately. This paper presents REPTrace, a generic methodology for capturing such execution paths in a transparent fashion. We analyze a comprehensive list of execution scenarios, and propose principles and algorithms for generating the endtoend request execution path for all the scenarios. Moreover, this paper presents an anomaly detection approach exploiting request execution paths to detect anomalies of the execution during request processing. The experiments on four popular distributed platforms with different workloads show that REPTrace can transparently capture the accurate request execution path with reasonable latency and negligible network overhead. Fault injection experiments show that execution anomalies are detected with high recall (96%).
 [12] arXiv:2001.07490 [pdf, other]

Title: Serverless Straggler Mitigation using Local ErrorCorrecting CodesAuthors: Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, Thomas Courtade, Kannan RamchandranSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)
Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase endtoend latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and highperformance computing. The proposed schemes are inspired by errorcorrecting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%.
 [13] arXiv:2001.07496 [pdf]

Title: Self Organization Agent Oriented Dynamic Resource Allocation on Open Federated Clouds EnvironmentJournalref: International Conference on Cloud Computing Technologies and Applications, CLOUDTECH 2016, May 2016, Marrakech, MoroccoSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
To ensure uninterrupted services to the cloud clients from federated cloud providers, it is important to guarantee an efficient allocation of the cloud resources to users to improve the rate of client satisfaction and the quality of the service provisions. It is better to get as more computing and storage resources as possible. In cloud domain several Multi Agent Resource Allocation methods have been proposed to implement the problem of dynamic resource allocation. However the problem is still open and many works to do in this field. In cloud computing robustness is important so in this paper we focus on autoadaptive method to deal with changes of open federated cloud computing environment. Our approach is hybrid, we first adopt an existing organizations optimization approach for self organization in broker agent organization to combine it with already existing Multi Agent Resource Allocation approach on Federated Clouds. We consider an open clouds federation environment which is dynamic and in constant evolution, new cloud operators can join the federation or leave this one. At the same time our approach is multi criterion which can take in account various parameters (i.e. computing load balance of mediator agent, geographical distance (network delay) between costumer and provider...).
 [14] arXiv:2001.07497 [pdf]

Title: An IoT Platformasaservice for NFV Based  Hybrid Cloud / Fog SystemsAuthors: Carla Mouradian, Fereshteh Ebrahimnezhad, Yassine Jebbar, Jasmeen Kaur Ahluwalia, Seyedeh Negar Afrasiabi, Roch H. Glitho, Ashok MogheSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)
Cloud computing, despite its inherent advantages (e.g., resource efficiency) still faces several challenges. the wide are network used to connect the cloud to endusers could cause high latency, which may not be tolerable for some applications, especially Internet of Things (IoT applications. Fog computing can reduce this latency by extending the traditional cloud architecture to the edge of the network and by enabling the deployment of some application components on fog nodes. Application providers use PlatformasaService (PaaS) to provision (i.e., develop, deploy, manage, and orchestrate) applications in cloud. However, existing PaaS solutions (including IoT PaaS) usually focus on cloud and do not enable provisioning of applications with components spanning cloud and fog. provisioning such applications require novel functions, such as application graph generation, that are absent from existing PaaS. Furthermore, several functions offered by existing PaaS (e.g., publication/discovery) need to be significantly extended in order to fit in a hybrid cloud/fog environment. In this paper, we propose a novel architecture for PaaS for hybrid cloud/fog system. It is IoT use casedriven, and its applications' components are implemented as Virtual Network Functions (VNFs) with execution sequences modeled s graphs with substructures such as selection and loops. It automates the provisioning of applications with components spanning cloud and fog. In addition, it enables the discovery of existing cloud and fog nodes and generates application graphs. A proof of concept is built based on Cloudify open source. Feasibility is demonstrated by evaluating its performance when PaaS modules and application components are placed in clouds and fogs in different geographical locations.
 [15] arXiv:2001.07557 [pdf, other]

Title: Lattice QCD on a novel vector architectureComments: Proceedings of Lattice 2019, 7 pages, 6 colorful figuresSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics  Lattice (heplat)
The SXAurora TSUBASA PCIe accelerator card is the newest model of NEC's SX architecture family. Its multicore vector processor features a vector length of 16 kbits and interfaces with up to 48 GB of HBM2 memory in the current models, available since 2018. The compute performance is up to 2.45 TFlop/s peak in double precision, and the memory throughput is up to 1.2 TB/s peak. New models with improved performance characteristics are announced for the near future. In this contribution we discuss key aspects of the SXAurora and describe how we enabled the architecture in the Grid Lattice QCD framework.
Crosslists for Wed, 22 Jan 20
 [16] arXiv:2001.07016 (crosslist from cs.CR) [pdf, other]

Title: BlockHouse: Blockchainbased Distributed Storehouse SystemComments: Published in "9TH LatinAmerican Symposium on Dependable Computing", 2019Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
We propose in this paper BlockHouse, a decentralized/P2P storage system fully based on private blockchains. Each participant can rent his unused storage in order to host data of other members. This system uses a dual Smart Contract and Proof of Retrievability system to automatically check at a fixed frequency if the file is still hosted. In addition to transparency, the blockchain allows a better integration with all payments associated to this type of system ( regular payments, sequestration to ensure good behaviors of users, ...). Except the data transferred between the client and the server, all the actions go through a smart contract in the blockchain in order to log, pay and secure the entire storage process.
 [17] arXiv:2001.07023 (crosslist from cs.CR) [pdf, ps, other]

Title: Segment blockchain: A size reduced storage mechanism for blockchainJournalref: IEEE Access,2020. https://ieeexplore.ieee.org/document/8957450/Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
The exponential growth of the blockchain size has become a major contributing factor that hinders the decentralisation of blockchain and its potential implementations in dataheavy applications. In this paper, we propose segment blockchain, an approach that segmentises blockchain and enables nodes to only store a copy of one blockchain segment. We use \emph{PoW} as a membership threshold to limit the number of nodes taken by an Adversarythe Adversary can only gain at most $n/2$ of nodes in a network of $n$ nodes when it has $50\%$ of the calculation power in the system (the Nakamoto blockchain security threshold). A segment blockchain system fails when an Adversary stores all copies of a segment, because the Adversary can then leave the system, causing a permanent loss of the segment. We theoretically prove that segment blockchain can sustain a $(AD/n)^m$ failure probability when the Adversary has no more than $AD$ number of nodes and every segment is stored by $m$ number of nodes. The storage requirement is mostly shrunken compared to the traditional design and therefore making the blockchain more suitable for dataheavy applications.
 [18] arXiv:2001.07134 (crosslist from cs.DS) [pdf, other]

Title: HighQuality Hierarchical Process MappingAuthors: Marcelo Fonseca Faraj, Alexander van der Grinten, Henning Meyerhenke, Jesper Larsson Träff, Christian SchulzSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task is then to map the blocks of the partition onto the processors such that the overall communication cost is reduced. We present novel multilevel algorithms that integrate graph partitioning and process mapping. Important ingredients of our algorithm include fast label propagation, more localized local search, initial partitioning, as well as a compressed data structure to compute processor distances without storing a distance matrix. Experiments indicate that our algorithms speed up the overall mapping process and, due to the integrated multilevel approach, also find much better solutions in practice. For example, one configuration of our algorithm yields better solutions than the previous stateoftheart in terms of mapping quality while being a factor 62 faster. Compared to the currently fastest iterated multilevel mapping algorithm Scotch, we obtain 16% better solutions while investing slightly more running time.
 [19] arXiv:2001.07158 (crosslist from cs.DS) [pdf, other]

Title: Finding temporal patterns using algebraic fingerprintsSubjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
In this paper we study a family of patterndetection problems in vertexcolored temporal graphs. In particular, given a vertexcolored temporal graph and a multiset of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several interesting applications, for example, recommending tours for tourists, or searching for abnormal behavior in a network of financial transactions.
For the family of patterndetection problems we define, we establish complexity results and design an algebraicalgorithmic framework based on constrained multilinear sieving. We demonstrate that our solution can scale to massive graphs with up to hundred million edges, despite the problems being NPhard. Our implementation, which is publicly available, exhibits practical edgelinear scalability and highly optimized. For example, in a realworld graph dataset with more than six million edges and a multiset query with ten colors, we can extract an optimal solution in less than eight minutes on a haswell desktop with four cores.  [20] arXiv:2001.07227 (crosslist from cs.IT) [pdf, other]

Title: Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing SystemsSubjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)
Polynomial coding has been proposed as a solution to the straggler mitigation problem in distributed matrix multiplication. Previous works in the literature employ univariate polynomials to encode matrix partitions. Such schemes greatly improve the speed of distributed computing systems by making the task completion time to depend only on the fastest workers. However, the work done by the slowest workers, which fails to finish the task assigned to them, is completely ignored. In order to exploit the partial computations of the slower workers, we further decompose the overall matrix multiplication task into even smaller subtasks to better fit workers' storage and computation capacities. In this work, we show that univariate schemes fail to make an efficient use of the storage capacity and we propose bivariate polynomial codes. We show that bivariate polynomial codes are a more natural choice to accommodate the additional decomposition of subtasks, as well as, heterogeneous storage and computation resources at workers. However, in contrast to univariate polynomial decoding, for multivariate interpolation guarantying decodability is much harder. We propose two bivartiate polynomial schemes. The first scheme exploits the fact that bivariate interpolation is always possible for rectangular grid of points. We obtain the rectangular grid of points at the cost of allowing some redundant computations. For the second scheme, we relax the decoding constraint, and require decodability for almost all choices of evaluation points. We present interpolation sets satisfying the almost decodability conditions for certain storage configurations of workers. Our numerical results show that bivariate polynomial coding considerably reduces the completion time of distributed matrix multiplication.
 [21] arXiv:2001.07297 (crosslist from cs.CR) [pdf, other]

Title: PoAh: A Novel Consensus Algorithm for Fast Scalable Private Blockchain for Largescale IoT FrameworksComments: 26 pages, 18 figuresSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
In today's connected world, resource constrained devices are deployed for sensing and decision making applications, ranging from smart cities to environmental monitoring. Those recourse constrained devices are connected to create realtime distributed networks popularly known as the Internet of Things (IoT), fog computing and edge computing. The blockchain is gaining a lot of interest in these domains to secure the system by ignoring centralized dependencies, where proofofwork (PoW) plays a vital role to make the whole security solution decentralized. Due to the resource limitations of the devices, PoW is not suitable for blockchainbased security solutions. This paper presents a novel consensus algorithm called ProofofAuthentication (PoAh), which introduces a cryptographic authentication mechanism to replace PoW for resource constrained devices, and to make the blockchain applicationspecific. PoAh is thus suitable for private as well as permissioned blockchains. Further, PoAh not only secures the systems, but also maintains system sustainability and scalability. The proposed consensus algorithm is evaluated theoretically in simulation scenarios, and in realtime hardware testbeds to validate its performance. Finally, PoAh and its integration with the blockchain in the IoT and edge computing scenarios is discussed. The proposed PoAh, while running in limited computer resources (e.g. singleboard computing devices like the Raspberry Pi) has a latency in the order of 3 secs.
 [22] arXiv:2001.07463 (crosslist from cs.LG) [pdf, ps, other]

Title: Fast SequenceBased Embedding with Diffusion GraphsComments: Source code available at: this https URLJournalref: CompleNet 2018Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
A graph embedding is a representation of graph vertices in a lowdimensional space, which approximately preserves properties such as distances between nodes. Vertex sequencebased embedding procedures use features extracted from linear sequences of nodes to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequencebased embedding methods.
 [23] arXiv:2001.07504 (crosslist from cs.AI) [pdf, other]

Title: Combining Federated and Active Learning for Communicationefficient Distributed Failure Prediction in AeronauticsAuthors: Nicolas Aussel (INF, ACMESSAMOVAR, IP Paris), Sophie Chabridon (IP Paris, INF, ACMESSAMOVAR), Yohan Petetin (TIPICSAMOVAR, CITI, IP Paris)Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Machine Learning has proven useful in the recent years as a way to achieve failure prediction for industrial systems. However, the high computational resources necessary to run learning algorithms are an obstacle to its widespread application. The subfield of Distributed Learning offers a solution to this problem by enabling the use of remote resources but at the expense of introducing communication costs in the application that are not always acceptable. In this paper, we propose a distributed learning approach able to optimize the use of computational and communication resources to achieve excellent learning model performances through a centralized architecture. To achieve this, we present a new centralized distributed learning algorithm that relies on the learning paradigms of Active Learning and Federated Learning to offer a communicationefficient method that offers guarantees of model precision on both the clients and the central server. We evaluate this method on a public benchmark and show that its performances in terms of precision are very close to stateoftheart performance level of nondistributed learning despite additional constraints.
Replacements for Wed, 22 Jan 20
 [24] arXiv:1606.06025 (replaced) [pdf, other]

Title: Efficient and Highquality Sparse Graph Coloring on the GPUComments: arXiv admin note: text overlap with arXiv:1205.3809 by other authorsJournalref: Concurrency and Computation: Practice and Experience, Volume 29, Issue 10, 2017Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [25] arXiv:1801.09805 (replaced) [pdf, other]

Title: Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network TrainingJournalref: SysML 2018Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [26] arXiv:1804.09494 (replaced) [pdf, other]

Title: On Optimizing Distributed Tucker Decomposition for Sparse TensorsAuthors: Venkatesan T. Chakaravarthy, Jee W. Choi, Douglas J. Joseph, Prakash Murali, Shivmaran S. Pandian, Yogish Sabharwal, Dheeraj SreedharComments: Abridged version of the paper to appear in the proceedings of ICS'18Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [27] arXiv:1805.07891 (replaced) [pdf, other]

Title: Parameter Hub: a RackScale Parameter Server for Distributed Deep Neural Network TrainingSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
 [28] arXiv:1905.02119 (replaced) [pdf, ps, other]

Title: Lynceus: Costefficient Tuning and Provisioning of Data Analytic JobsComments: This updated version features a novel extension of our approach: the time out mechanism. Additionally, we improved the writeup of the paper, fruit of the collaboration with professor David Garlan and Carnegie Mellon UniveristySubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [29] arXiv:1910.10434 (replaced) [pdf, ps, other]

Title: Divide and Scale: Formalization of Distributed Ledger Sharding ProtocolsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [30] arXiv:1911.06969 (replaced) [pdf, other]

Title: Pangolin: An Efficient and Flexible Graph Pattern Mining System on CPU and GPUSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [31] arXiv:2001.01603 (replaced) [pdf, other]

Title: GeoBroker: Leveraging GeoContexts for IoT Data DistributionComments: Accepted for publication in Elsevier Computer CommunicationsJournalref: Computer Communications 151 (2020) 473484Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [32] arXiv:2001.03836 (replaced) [pdf, ps, other]

Title: Private and CommunicationEfficient Edge Learning: A Sparse Differential GaussianMasking Distributed SGD ApproachSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
 [33] arXiv:1902.04899 (replaced) [pdf, other]

Title: Local approximation of the Maximum Cut in regular graphsComments: 19 pages, 6 figures  Full version of a paper accepted in the 45th International Workshop on GraphTheoretic Concepts in Computer Science (WG 2019)Subjects: Combinatorics (math.CO); Distributed, Parallel, and Cluster Computing (cs.DC)
 [34] arXiv:1906.08986 (replaced) [pdf, other]

Title: Database Meets Deep Learning: Challenges and OpportunitiesComments: The first version of this paper has appeared in SIGMOD Record. In this (third) version, we extend it to include the recent developments in this field and references to recent work (especially for section 3.2 and section 4.2)Journalref: ACM SIGMOD Record, Volume 45 Issue 2, June 2016, Pages 1722Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
 [35] arXiv:1906.10575 (replaced) [pdf, other]

Title: Parallel Performance of Algebraic Multigrid Domain Decomposition (AMGDD)Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
 [36] arXiv:1908.11488 (replaced) [pdf, ps, other]

Title: Quantum Distributed Algorithm for Triangle Finding in the CONGEST ModelComments: 13 pages; v2: minor corrections; to appear in the proceedings of STACS'20Subjects: Quantum Physics (quantph); Distributed, Parallel, and Cluster Computing (cs.DC)
 [37] arXiv:1909.04746 (replaced) [pdf, other]

Title: Tighter Theory for Local SGD on Identical and Heterogeneous DataComments: 30 pages, 1 algorithm, 5 theorems, 5 corollaries, 14 lemmas, 2 propositions, 5 figuresSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [38] arXiv:1911.04200 (replaced) [pdf, other]

Title: CommunicationEfficient Jaccard Similarity for HighPerformance Distributed Genome ComparisonsAuthors: Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Rätsch, Torsten Hoefler, Edgar SolomonikSubjects: Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Genomics (qbio.GN)
 [39] arXiv:1911.07154 (replaced) [pdf, ps, other]

Title: Sparse Hopsets in Congested CliqueAuthors: Yasamin NazariSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
[ showing up to 2000 entries per page: fewer  more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2001, contact, help (Access key information)