We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 39 entries: 1-39 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 22 Jan 20

[1]  arXiv:2001.06778 [pdf, other]
Title: CycLedger: A Scalable and Secure Parallel Protocol for Distributed Ledger via Sharding
Comments: short version in IPDPS 2020
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

Traditional public distributed ledgers have not been able to scale-out well and work efficiently. Sharding is deemed as a promising way to solve this problem. By partitioning all nodes into small committees and letting them work in parallel, we can significantly lower the amount of communication and computation, reduce the overhead on each node's storage, as well as enhance the throughput of distributed ledger. Existing sharding-based protocols still suffer from several serious drawbacks. The first thing is that all honest nodes must connect well with each other, which demands a huge number of communication channels in the network. Moreover, previous protocols have face great loss in efficiency in the case where the honesty of each committee's leader is in question. At the same time, no explicit incentive is provided for nodes to actively participate in the protocol.
We present CycLedger, a scalable and secure parallel protocol for distributed ledger via sharding. Our protocol selects a leader and a partial set for each committee, who are in charge of maintaining intra-shard consensus and communicating with other committees, to reduce the amortized complexity of communication, computation and storage on all nodes. We introduce a novel commitment scheme between committees and a recovery procedure to prevent the system from crashing even when leaders of committees are malicious. To add incentive for the network, we use the concept of reputation, which measures each node's computing power. As nodes with higher reputation receive more rewards, there is an encouragement for nodes with strong computing ability to work honestly so as to gain reputation. In this way, we strike out a new path to establish scalability, security and incentive for the sharding-based distributed ledger.

[2]  arXiv:2001.06935 [pdf, other]
Title: 75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices
Comments: 3 pages, 2 figures, 28 references, accepted to Northeast Database Day (NEDB) 2020. arXiv admin note: substantial text overlap with arXiv:1907.04217
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Data Structures and Algorithms (cs.DS); Performance (cs.PF); Social and Information Networks (cs.SI)

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

[3]  arXiv:2001.06989 [pdf, other]
Title: The Parallelism Motifs of Genomic Data Analysis
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing.

[4]  arXiv:2001.07000 [pdf, ps, other]
Title: Contract-connection:An efficient communication protocol for Distributed Ledger Technology
Journal-ref: 2019 IEEE 38th International Performance Computing and Communications Conference (IPCCC)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)

Distributed Ledger Technology (DLT) is promising to become the foundation of many decentralised systems. However, the unbalanced and unregulated network layout contributes to the inefficiency of DLT especially in the Internet of Things (IoT) environments, where nodes connect to only a limited number of peers. The data communication speed globally is unbalanced and does not live up to the constraints of efficient real-time distributed systems. In this paper, we introduce a new communication protocol, which enables nodes to calculate the tradeoff between connecting/disconnecting a peer in a completely decentralised manner. The network layout globally is continuously re-balancing and optimising along with nodes adjusting their peers. This communication protocol weakened the inequality of the communication network. The experiment suggests this communication protocol is stable and efficient.

[5]  arXiv:2001.07022 [pdf, other]
Title: BAASH: Enabling Blockchain-as-a-Service on High-Performance Computing Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

The state-of-the-art approach to manage blockchains is to process blocks of transactions in a shared-nothing environment. Although blockchains have the potential to provide various services for high-performance computing (HPC) systems, HPC will not be able to embrace blockchains before the following two missing pieces become available: (i) new consensus protocols being aware of the shared-storage architecture in HPC, and (ii) new fault-tolerant mechanisms compensating for HPC's programming model---the message passing interface (MPI)---that is vulnerable for blockchain-like workloads. To this end, we design a new set of consensus protocols crafted for the HPC platforms and a new fault-tolerance subsystem compensating for the failures caused by faulty MPI processes. Built on top of the new protocols and fault-tolerance mechanism, a prototype system is implemented and evaluated with two million transactions on a 500-core HPC cluster, showing $6\times$, $12\times$, and $75\times$ higher throughput than Hyperldeger, Ethereum, and Parity, respectively.

[6]  arXiv:2001.07077 [pdf, other]
Title: Distributed Vehicular Computing at the Dawn of 5G: a Survey
Comments: 34 pages, 10 figures, Journal
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Recent advances in information technology have revolutionized the automotive industry, paving the way for next-generation smart and connected vehicles. Connected vehicles can collaborate to deliver novel services and applications. These services and applications require 1) massive volumes of data that perceive ambient environments, 2) ultra-reliable and low-latency communication networks, 3) real-time data processing which provides decision support under application-specific constraints. Addressing such constraints introduces significant challenges with current communication and computation technologies. Coincidentally, the fifth generation of cellular networks (5G) was developed to respond to communication challenges by providing an infrastructure for low-latency, high-reliability, and high bandwidth communication. At the core of this infrastructure, edge computing allows data offloading and computation at the edge of the network, ensuring low-latency and context-awareness, and pushing the utilization efficiency of 5G to its limit. In this paper, we aim at providing a comprehensive overview of the state of research on vehicular computing in the emerging age of 5G. After reviewing the main vehicular applications requirements and challenges, we follow a bottom-up approach, starting with the promising technologies for vehicular communications, all the way up to Artificial Intelligence (AI) solutions. We explore the various architectures for vehicular computing, including centralized Cloud Computing, Vehicular Cloud Computing, and Vehicular Edge computing, and investigate the potential data analytics technologies and their integration on top of the vehicular computing architectures. We finally discuss several future research directions and applications for vehicular computation systems.

[7]  arXiv:2001.07086 [pdf, other]
Title: 2PS: High-Quality Edge Partitioning with Two-Phase Streaming
Comments: in submission
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Graph partitioning is an important preprocessing step to distributed graph processing. In edge partitioning, the edge set of a given graph is split into $k$ equally-sized partitions, such that the replication of vertices across partitions is minimized. Streaming is a viable approach to partition graphs that exceed the memory capacities of a single server. The graph is ingested as a stream of edges, and one edge at a time is immediately and irrevocably assigned to a partition based on a scoring function. However, streaming partitioning suffers from the uninformed assignment problem: At the time of partitioning early edges in the stream, there is no information available about the rest of the edges. As a consequence, edge assignments are often driven by balancing considerations, and the achieved replication factor is comparably high. In this paper, we propose 2PS, a novel two-phase streaming algorithm for high-quality edge partitioning. In the first phase, vertices are separated into clusters by a lightweight streaming clustering algorithm. In the second phase, the graph is re-streamed and edge partitioning is performed while taking into account the clustering of the vertices from the first phase. Our evaluations show that 2PS can achieve a replication factor that is comparable to heavy-weight random access partitioners while inducing orders of magnitude lower memory overhead.

[8]  arXiv:2001.07091 [pdf, other]
Title: Blockchain Consensuses Algorithms: A Survey
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

In recent years, blockchain technology has received unparalleled attention from academia, industry, and governments all around the world. It is considered a technological breakthrough anticipated to disrupt several application domains. This has resulted in a plethora of blockchain systems for various purposes. However, many of these blockchain systems suffer from serious shortcomings related to their performance and security, which need to be addressed before any wide-scale adoption can be achieved. A crucial component of any blockchain system is its underlying consensus algorithm, which in many ways, determines its performance and security. Therefore, to address the limitations of different blockchain systems, several existing as well novel consensus algorithms have been introduced. A systematic analysis of these algorithms will help to understand how and why any particular blockchain performs the way it functions. However, the existing studies of consensus algorithms are not comprehensive. Those studies have incomplete discussions on the properties of the algorithms and fail to analyse several major blockchain consensus algorithms in terms of their scopes. This article fills this gap by analysing a wide range of consensus algorithms using a comprehensive taxonomy of properties and by examining the implications of different issues still prevalent in consensus algorithms in detail. The result of the analysis is presented in tabular formats, which provides a visual illustration of these algorithms in a meaningful way. We have also analysed more than hundred top crypto-currencies belonging to different categories of consensus algorithms to understand their properties and to implicate different trends in these crypto-currencies. Finally, we have presented a decision tree of algorithms to be used as a tool to test the suitability of consensus algorithms under different criteria.

[9]  arXiv:2001.07103 [pdf, ps, other]
Title: OpenMP Parallelization of Dynamic Programming and Greedy Algorithms
Authors: Claude Tadonki
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Multicore has emerged as a typical architecture model since its advent and stands now as a standard. The trend is to increase the number of cores and improve the performance of the memory system. Providing an efficient multicore implementation for a important algorithmic kernel is a genuine contribution. From a methodology standpoint, this should be done at the level of the underlying paradigm if any. In this paper, we study the cases of {\em dynamic programming} and {\em greedy algorithms}, which are two major algorithmic paradigms. We exclusively consider directives-based loop parallelization through OpenMP and investigate necessary pre-transformations to reach a regular parallel form. We evaluate our methodology with a selection of well-known combinatorial optimization problems on an INTEL Broadwell processor. Key points for scalability are discussed before and after experimental results. Our immediate perspective is to extend our study to the manycore case, with a special focus on NUMA configurations.

[10]  arXiv:2001.07104 [pdf, other]
Title: A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Performance (cs.PF)

Characterizing compute kernel execution behavior on GPUs for efficient task scheduling is a non trivial task. We address this with a simple model enabling portable and fast predictions among different GPUs using only hardware-independent features extracted. This model is built based on random forests using 189 individual compute kernels from benchmarks such as Parboil, Rodinia, Polybench-GPU and SHOC. Evaluation of the model performance using cross-validation yields a median Mean Average Percentage Error (MAPE) of [13.45%, 44.56%] and [1.81%, 2.91%], for time respectively power prediction on five different GPUs, while latency for a single prediction varies between 0.1 and 0.2 seconds.

[11]  arXiv:2001.07276 [pdf]
Title: Transparently Capturing Request Execution Path for Anomaly Detection
Comments: 13pages, 7 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

With the increasing scale and complexity of cloud systems and big data analytics platforms, it is becoming more and more challenging to understand and diagnose the processing of a service request in such distributed platforms. One way that helps to deal with this problem is to capture the complete end-to-end execution path of service requests among all involved components accurately. This paper presents REPTrace, a generic methodology for capturing such execution paths in a transparent fashion. We analyze a comprehensive list of execution scenarios, and propose principles and algorithms for generating the end-to-end request execution path for all the scenarios. Moreover, this paper presents an anomaly detection approach exploiting request execution paths to detect anomalies of the execution during request processing. The experiments on four popular distributed platforms with different workloads show that REPTrace can transparently capture the accurate request execution path with reasonable latency and negligible network overhead. Fault injection experiments show that execution anomalies are detected with high recall (96%).

[12]  arXiv:2001.07490 [pdf, other]
Title: Serverless Straggler Mitigation using Local Error-Correcting Codes
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Information Theory (cs.IT)

Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%.

[13]  arXiv:2001.07496 [pdf]
Title: Self Organization Agent Oriented Dynamic Resource Allocation on Open Federated Clouds Environment
Authors: Kemchi Sofiane, Abdelhafid Zitouni (LIRE), Mahieddine Djoudi (TECHNÉ - EA 6316)
Journal-ref: International Conference on Cloud Computing Technologies and Applications, CLOUDTECH 2016, May 2016, Marrakech, Morocco
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

To ensure uninterrupted services to the cloud clients from federated cloud providers, it is important to guarantee an efficient allocation of the cloud resources to users to improve the rate of client satisfaction and the quality of the service provisions. It is better to get as more computing and storage resources as possible. In cloud domain several Multi Agent Resource Allocation methods have been proposed to implement the problem of dynamic resource allocation. However the problem is still open and many works to do in this field. In cloud computing robustness is important so in this paper we focus on auto-adaptive method to deal with changes of open federated cloud computing environment. Our approach is hybrid, we first adopt an existing organizations optimization approach for self organization in broker agent organization to combine it with already existing Multi Agent Resource Allocation approach on Federated Clouds. We consider an open clouds federation environment which is dynamic and in constant evolution, new cloud operators can join the federation or leave this one. At the same time our approach is multi criterion which can take in account various parameters (i.e. computing load balance of mediator agent, geographical distance (network delay) between costumer and provider...).

[14]  arXiv:2001.07497 [pdf]
Title: An IoT Platform-as-a-service for NFV Based -- Hybrid Cloud / Fog Systems
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Software Engineering (cs.SE)

Cloud computing, despite its inherent advantages (e.g., resource efficiency) still faces several challenges. the wide are network used to connect the cloud to end-users could cause high latency, which may not be tolerable for some applications, especially Internet of Things (IoT applications. Fog computing can reduce this latency by extending the traditional cloud architecture to the edge of the network and by enabling the deployment of some application components on fog nodes. Application providers use Platform-as-a-Service (PaaS) to provision (i.e., develop, deploy, manage, and orchestrate) applications in cloud. However, existing PaaS solutions (including IoT PaaS) usually focus on cloud and do not enable provisioning of applications with components spanning cloud and fog. provisioning such applications require novel functions, such as application graph generation, that are absent from existing PaaS. Furthermore, several functions offered by existing PaaS (e.g., publication/discovery) need to be significantly extended in order to fit in a hybrid cloud/fog environment. In this paper, we propose a novel architecture for PaaS for hybrid cloud/fog system. It is IoT use case-driven, and its applications' components are implemented as Virtual Network Functions (VNFs) with execution sequences modeled s graphs with sub-structures such as selection and loops. It automates the provisioning of applications with components spanning cloud and fog. In addition, it enables the discovery of existing cloud and fog nodes and generates application graphs. A proof of concept is built based on Cloudify open source. Feasibility is demonstrated by evaluating its performance when PaaS modules and application components are placed in clouds and fogs in different geographical locations.

[15]  arXiv:2001.07557 [pdf, other]
Title: Lattice QCD on a novel vector architecture
Comments: Proceedings of Lattice 2019, 7 pages, 6 colorful figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); High Energy Physics - Lattice (hep-lat)

The SX-Aurora TSUBASA PCIe accelerator card is the newest model of NEC's SX architecture family. Its multi-core vector processor features a vector length of 16 kbits and interfaces with up to 48 GB of HBM2 memory in the current models, available since 2018. The compute performance is up to 2.45 TFlop/s peak in double precision, and the memory throughput is up to 1.2 TB/s peak. New models with improved performance characteristics are announced for the near future. In this contribution we discuss key aspects of the SX-Aurora and describe how we enabled the architecture in the Grid Lattice QCD framework.

Cross-lists for Wed, 22 Jan 20

[16]  arXiv:2001.07016 (cross-list from cs.CR) [pdf, other]
Title: BlockHouse: Blockchain-based Distributed Storehouse System
Comments: Published in "9TH Latin-American Symposium on Dependable Computing", 2019
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

We propose in this paper BlockHouse, a decentralized/P2P storage system fully based on private blockchains. Each participant can rent his unused storage in order to host data of other members. This system uses a dual Smart Contract and Proof of Retrievability system to automatically check at a fixed frequency if the file is still hosted. In addition to transparency, the blockchain allows a better integration with all payments associated to this type of system ( regular payments, sequestration to ensure good behaviors of users, ...). Except the data transferred between the client and the server, all the actions go through a smart contract in the blockchain in order to log, pay and secure the entire storage process.

[17]  arXiv:2001.07023 (cross-list from cs.CR) [pdf, ps, other]
Title: Segment blockchain: A size reduced storage mechanism for blockchain
Journal-ref: IEEE Access,2020. https://ieeexplore.ieee.org/document/8957450/
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

The exponential growth of the blockchain size has become a major contributing factor that hinders the decentralisation of blockchain and its potential implementations in data-heavy applications. In this paper, we propose segment blockchain, an approach that segmentises blockchain and enables nodes to only store a copy of one blockchain segment. We use \emph{PoW} as a membership threshold to limit the number of nodes taken by an Adversary---the Adversary can only gain at most $n/2$ of nodes in a network of $n$ nodes when it has $50\%$ of the calculation power in the system (the Nakamoto blockchain security threshold). A segment blockchain system fails when an Adversary stores all copies of a segment, because the Adversary can then leave the system, causing a permanent loss of the segment. We theoretically prove that segment blockchain can sustain a $(AD/n)^m$ failure probability when the Adversary has no more than $AD$ number of nodes and every segment is stored by $m$ number of nodes. The storage requirement is mostly shrunken compared to the traditional design and therefore making the blockchain more suitable for data-heavy applications.

[18]  arXiv:2001.07134 (cross-list from cs.DS) [pdf, other]
Title: High-Quality Hierarchical Process Mapping
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)

Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task is then to map the blocks of the partition onto the processors such that the overall communication cost is reduced. We present novel multilevel algorithms that integrate graph partitioning and process mapping. Important ingredients of our algorithm include fast label propagation, more localized local search, initial partitioning, as well as a compressed data structure to compute processor distances without storing a distance matrix. Experiments indicate that our algorithms speed up the overall mapping process and, due to the integrated multilevel approach, also find much better solutions in practice. For example, one configuration of our algorithm yields better solutions than the previous state-of-the-art in terms of mapping quality while being a factor 62 faster. Compared to the currently fastest iterated multilevel mapping algorithm Scotch, we obtain 16% better solutions while investing slightly more running time.

[19]  arXiv:2001.07158 (cross-list from cs.DS) [pdf, other]
Title: Finding temporal patterns using algebraic fingerprints
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)

In this paper we study a family of pattern-detection problems in vertex-colored temporal graphs. In particular, given a vertex-colored temporal graph and a multi-set of colors as a query, we search for temporal paths in the graph that contain the colors specified in the query. These types of problems have several interesting applications, for example, recommending tours for tourists, or searching for abnormal behavior in a network of financial transactions.
For the family of pattern-detection problems we define, we establish complexity results and design an algebraic-algorithmic framework based on constrained multilinear sieving. We demonstrate that our solution can scale to massive graphs with up to hundred million edges, despite the problems being NP-hard. Our implementation, which is publicly available, exhibits practical edge-linear scalability and highly optimized. For example, in a real-world graph dataset with more than six million edges and a multi-set query with ten colors, we can extract an optimal solution in less than eight minutes on a haswell desktop with four cores.

[20]  arXiv:2001.07227 (cross-list from cs.IT) [pdf, other]
Title: Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems
Subjects: Information Theory (cs.IT); Distributed, Parallel, and Cluster Computing (cs.DC)

Polynomial coding has been proposed as a solution to the straggler mitigation problem in distributed matrix multiplication. Previous works in the literature employ univariate polynomials to encode matrix partitions. Such schemes greatly improve the speed of distributed computing systems by making the task completion time to depend only on the fastest workers. However, the work done by the slowest workers, which fails to finish the task assigned to them, is completely ignored. In order to exploit the partial computations of the slower workers, we further decompose the overall matrix multiplication task into even smaller subtasks to better fit workers' storage and computation capacities. In this work, we show that univariate schemes fail to make an efficient use of the storage capacity and we propose bivariate polynomial codes. We show that bivariate polynomial codes are a more natural choice to accommodate the additional decomposition of subtasks, as well as, heterogeneous storage and computation resources at workers. However, in contrast to univariate polynomial decoding, for multivariate interpolation guarantying decodability is much harder. We propose two bivartiate polynomial schemes. The first scheme exploits the fact that bivariate interpolation is always possible for rectangular grid of points. We obtain the rectangular grid of points at the cost of allowing some redundant computations. For the second scheme, we relax the decoding constraint, and require decodability for almost all choices of evaluation points. We present interpolation sets satisfying the almost decodability conditions for certain storage configurations of workers. Our numerical results show that bivariate polynomial coding considerably reduces the completion time of distributed matrix multiplication.

[21]  arXiv:2001.07297 (cross-list from cs.CR) [pdf, other]
Title: PoAh: A Novel Consensus Algorithm for Fast Scalable Private Blockchain for Large-scale IoT Frameworks
Comments: 26 pages, 18 figures
Subjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)

In today's connected world, resource constrained devices are deployed for sensing and decision making applications, ranging from smart cities to environmental monitoring. Those recourse constrained devices are connected to create real-time distributed networks popularly known as the Internet of Things (IoT), fog computing and edge computing. The blockchain is gaining a lot of interest in these domains to secure the system by ignoring centralized dependencies, where proof-of-work (PoW) plays a vital role to make the whole security solution decentralized. Due to the resource limitations of the devices, PoW is not suitable for blockchain-based security solutions. This paper presents a novel consensus algorithm called Proof-of-Authentication (PoAh), which introduces a cryptographic authentication mechanism to replace PoW for resource constrained devices, and to make the blockchain application-specific. PoAh is thus suitable for private as well as permissioned blockchains. Further, PoAh not only secures the systems, but also maintains system sustainability and scalability. The proposed consensus algorithm is evaluated theoretically in simulation scenarios, and in real-time hardware testbeds to validate its performance. Finally, PoAh and its integration with the blockchain in the IoT and edge computing scenarios is discussed. The proposed PoAh, while running in limited computer resources (e.g. single-board computing devices like the Raspberry Pi) has a latency in the order of 3 secs.

[22]  arXiv:2001.07463 (cross-list from cs.LG) [pdf, ps, other]
Title: Fast Sequence-Based Embedding with Diffusion Graphs
Comments: Source code available at: this https URL
Journal-ref: CompleNet 2018
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Social and Information Networks (cs.SI); Machine Learning (stat.ML)

A graph embedding is a representation of graph vertices in a low-dimensional space, which approximately preserves properties such as distances between nodes. Vertex sequence-based embedding procedures use features extracted from linear sequences of nodes to create embeddings using a neural network. In this paper, we propose diffusion graphs as a method to rapidly generate vertex sequences for network embedding. Its computational efficiency is superior to previous methods due to simpler sequence generation, and it produces more accurate results. In experiments, we found that the performance relative to other methods improves with increasing edge density in the graph. In a community detection task, clustering nodes in the embedding space produces better results compared to other sequence-based embedding methods.

[23]  arXiv:2001.07504 (cross-list from cs.AI) [pdf, other]
Title: Combining Federated and Active Learning for Communication-efficient Distributed Failure Prediction in Aeronautics
Authors: Nicolas Aussel (INF, ACMES-SAMOVAR, IP Paris), Sophie Chabridon (IP Paris, INF, ACMES-SAMOVAR), Yohan Petetin (TIPIC-SAMOVAR, CITI, IP Paris)
Subjects: Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)

Machine Learning has proven useful in the recent years as a way to achieve failure prediction for industrial systems. However, the high computational resources necessary to run learning algorithms are an obstacle to its widespread application. The sub-field of Distributed Learning offers a solution to this problem by enabling the use of remote resources but at the expense of introducing communication costs in the application that are not always acceptable. In this paper, we propose a distributed learning approach able to optimize the use of computational and communication resources to achieve excellent learning model performances through a centralized architecture. To achieve this, we present a new centralized distributed learning algorithm that relies on the learning paradigms of Active Learning and Federated Learning to offer a communication-efficient method that offers guarantees of model precision on both the clients and the central server. We evaluate this method on a public benchmark and show that its performances in terms of precision are very close to state-of-the-art performance level of non-distributed learning despite additional constraints.

Replacements for Wed, 22 Jan 20

[24]  arXiv:1606.06025 (replaced) [pdf, other]
Title: Efficient and High-quality Sparse Graph Coloring on the GPU
Comments: arXiv admin note: text overlap with arXiv:1205.3809 by other authors
Journal-ref: Concurrency and Computation: Practice and Experience, Volume 29, Issue 10, 2017
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[25]  arXiv:1801.09805 (replaced) [pdf, other]
Title: Parameter Box: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training
Journal-ref: SysML 2018
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[26]  arXiv:1804.09494 (replaced) [pdf, other]
Title: On Optimizing Distributed Tucker Decomposition for Sparse Tensors
Comments: Abridged version of the paper to appear in the proceedings of ICS'18
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[27]  arXiv:1805.07891 (replaced) [pdf, other]
Title: Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[28]  arXiv:1905.02119 (replaced) [pdf, ps, other]
Title: Lynceus: Cost-efficient Tuning and Provisioning of Data Analytic Jobs
Comments: This updated version features a novel extension of our approach: the time out mechanism. Additionally, we improved the write-up of the paper, fruit of the collaboration with professor David Garlan and Carnegie Mellon Univeristy
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[29]  arXiv:1910.10434 (replaced) [pdf, ps, other]
Title: Divide and Scale: Formalization of Distributed Ledger Sharding Protocols
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[30]  arXiv:1911.06969 (replaced) [pdf, other]
Title: Pangolin: An Efficient and Flexible Graph Pattern Mining System on CPU and GPU
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[31]  arXiv:2001.01603 (replaced) [pdf, other]
Title: GeoBroker: Leveraging Geo-Contexts for IoT Data Distribution
Comments: Accepted for publication in Elsevier Computer Communications
Journal-ref: Computer Communications 151 (2020) 473-484
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[32]  arXiv:2001.03836 (replaced) [pdf, ps, other]
Title: Private and Communication-Efficient Edge Learning: A Sparse Differential Gaussian-Masking Distributed SGD Approach
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI)
[33]  arXiv:1902.04899 (replaced) [pdf, other]
Title: Local approximation of the Maximum Cut in regular graphs
Comments: 19 pages, 6 figures - Full version of a paper accepted in the 45th International Workshop on Graph-Theoretic Concepts in Computer Science (WG 2019)
Subjects: Combinatorics (math.CO); Distributed, Parallel, and Cluster Computing (cs.DC)
[34]  arXiv:1906.08986 (replaced) [pdf, other]
Title: Database Meets Deep Learning: Challenges and Opportunities
Comments: The first version of this paper has appeared in SIGMOD Record. In this (third) version, we extend it to include the recent developments in this field and references to recent work (especially for section 3.2 and section 4.2)
Journal-ref: ACM SIGMOD Record, Volume 45 Issue 2, June 2016, Pages 17-22
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
[35]  arXiv:1906.10575 (replaced) [pdf, other]
Title: Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD)
Subjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
[36]  arXiv:1908.11488 (replaced) [pdf, ps, other]
Title: Quantum Distributed Algorithm for Triangle Finding in the CONGEST Model
Comments: 13 pages; v2: minor corrections; to appear in the proceedings of STACS'20
Subjects: Quantum Physics (quant-ph); Distributed, Parallel, and Cluster Computing (cs.DC)
[37]  arXiv:1909.04746 (replaced) [pdf, other]
Title: Tighter Theory for Local SGD on Identical and Heterogeneous Data
Comments: 30 pages, 1 algorithm, 5 theorems, 5 corollaries, 14 lemmas, 2 propositions, 5 figures
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA); Optimization and Control (math.OC); Machine Learning (stat.ML)
[38]  arXiv:1911.04200 (replaced) [pdf, other]
Title: Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons
Subjects: Computational Engineering, Finance, and Science (cs.CE); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Genomics (q-bio.GN)
[39]  arXiv:1911.07154 (replaced) [pdf, ps, other]
Title: Sparse Hopsets in Congested Clique
Authors: Yasamin Nazari
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
[ total of 39 entries: 1-39 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2001, contact, help  (Access key information)