We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 12 entries: 1-12 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 8 Dec 22

[1]  arXiv:2212.03332 [pdf, other]
Title: Edge Impulse: An MLOps Platform for Tiny Machine Learning
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Software Engineering (cs.SE)

Edge Impulse is a cloud-based machine learning operations (MLOps) platform for developing embedded and edge ML (TinyML) systems that can be deployed to a wide range of hardware targets. Current TinyML workflows are plagued by fragmented software stacks and heterogeneous deployment hardware, making ML model optimizations difficult and unportable. We present Edge Impulse, a practical MLOps platform for developing TinyML systems at scale. Edge Impulse addresses these challenges and streamlines the TinyML design cycle by supporting various software and hardware optimizations to create an extensible and portable software stack for a multitude of embedded systems. As of Oct. 2022, Edge Impulse hosts 118,185 projects from 50,953 developers.

[2]  arXiv:2212.03410 [pdf, other]
Title: SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems
Comments: 14 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI); Performance (cs.PF)

Novel artificial intelligence (AI) technology has expedited various scientific research, e.g., cosmology, physics and bioinformatics, inevitably becoming a significant category of workload on high performance computing (HPC) systems. Existing AI benchmarks tend to customize well-recognized AI applications, so as to evaluate the AI performance of HPC systems under predefined problem size, in terms of datasets and AI models. Due to lack of scalability on the problem size, static AI benchmarks might be under competent to help understand the performance trend of evolving AI applications on HPC systems, in particular, the scientific AI applications on large-scale systems.
In this paper, we propose a scalable evaluation methodology (SAIH) for analyzing the AI performance trend of HPC systems with scaling the problem sizes of customized AI applications. To enable scalability, SAIH builds a set of novel mechanisms for augmenting problem sizes. As the data and model constantly scale, we can investigate the trend and range of AI performance on HPC systems, and further diagnose system bottlenecks. To verify our methodology, we augment a cosmological AI application to evaluate a real HPC system equipped with GPUs as a case study of SAIH.

[3]  arXiv:2212.03414 [pdf, other]
Title: SDRM3: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads
Comments: 13 pages
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control often involve dynamic behaviors in various levels; task, model, and layers (or, ML operators) within a model. Such dynamic behaviors are new challenges to the system software in an ML system because the overall system load is unpredictable unlike traditional ML workloads. Also, the real-time processing requires to meet deadlines, and multi-model workloads involve highly heterogeneous models. As RTMM workloads often run on resource-constrained devices (e.g., VR headset), developing an effective scheduler is an important research problem.
Therefore, we propose a new scheduler, SDRM3, that effectively handles various dynamicity in RTMM style workloads targeting multi-accelerator systems. To make scheduling decisions, SDRM3 quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. SDRM3 has tunable parameters that provide fast adaptivity to dynamic workload changes based on a gradient descent-like online optimization, which typically converges within five steps for new workloads. In addition, we also propose a method to exploit model level dynamicity based on Supernet for exploiting the trade-off between the scheduling effectiveness and model performance (e.g., accuracy), which dynamically selects a proper sub-network in a Supernet based on the system loads.
In our evaluation on five realistic RTMM workload scenarios, SDRM3 reduces the overall UXCost, which is a energy-delay-product (EDP)-equivalent metric for real-time applications defined in the paper, by 37.7% and 53.2% on geometric mean (up to 97.6% and 97.1%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.

[4]  arXiv:2212.03547 [pdf, other]
Title: A Fault Tolerant Elastic Resource Management Framework Towards High Availability of Cloud Services
Comments: IEEE Transactions of Network and Service Management, 2022
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Cloud computing has become inevitable for every digital service which has exponentially increased its usage. However, a tremendous surge in cloud resource demand stave off service availability resulting into outages, performance degradation, load imbalance, and excessive power-consumption. The existing approaches mainly attempt to address the problem by using multi-cloud and running multiple replicas of a virtual machine (VM) which accounts for high operational-cost. This paper proposes a Fault Tolerant Elastic Resource Management (FT-ERM) framework that addresses aforementioned problem from a different perspective by inducing high-availability in servers and VMs. Specifically, (1) an online failure predictor is developed to anticipate failure-prone VMs based on predicted resource contention; (2) the operational status of server is monitored with the help of power analyser, resource estimator and thermal analyser to identify any failure due to overloading and overheating of servers proactively; and (3) failure-prone VMs are assigned to proposed fault-tolerance unit composed of decision matrix and safe box to trigger VM migration and handle any outage beforehand while maintaining desired level of availability for cloud users. The proposed framework is evaluated and compared against state-of-the-arts by executing experiments using two real-world datasets. FT-ERM improved the availability of the services up to 34.47% and scales down VM-migration and power-consumption up to 88.6% and 62.4%, respectively over without FT-ERM approach.

[5]  arXiv:2212.03802 [pdf, other]
Title: Distributed Load Orchestration for Vision Computing in Multi-Access Edge Computing
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)

Multi-access Edge Computing (MEC) is a type of network architecture that provides cloud computing capabilities at the edge of the network. We consider the use case of video surveillance for an university campus running on a 5G-MEC environment. A key issue is the eventual overloading of computing resources on the MEC nodes during peak demand. We propose a new strategy for distributed orchestration in MEC environments based on how load balancing strategies organize processing queue. Then, we elaborated a strategy for deadline-aware queueing prioritization that organizes requests based on pre-established thresholds. We introduce a simulation-based experimentation environment and conduct a number of tests demonstrating the benefit of our approach by reducing the number of referrals and improving the effectiveness in meeting deadlines.

Cross-lists for Thu, 8 Dec 22

[6]  arXiv:2212.03426 (cross-list from cs.ET) [pdf, other]
Title: Efficient Optimization with Higher-Order Ising Machines
Comments: 13 pages, 4 figures
Subjects: Emerging Technologies (cs.ET); Distributed, Parallel, and Cluster Computing (cs.DC); Neural and Evolutionary Computing (cs.NE)

A prominent approach to solving combinatorial optimization problems on parallel hardware is Ising machines, i.e., hardware implementations of networks of interacting binary spin variables. Most Ising machines leverage second-order interactions although important classes of optimization problems, such as satisfiability problems, map more seamlessly to Ising networks with higher-order interactions. Here, we demonstrate that higher-order Ising machines can solve satisfiability problems more resource-efficiently in terms of the number of spin variables and their connections when compared to traditional second-order Ising machines. Further, our results show on a benchmark dataset of Boolean \textit{k}-satisfiability problems that higher-order Ising machines implemented with coupled oscillators rapidly find solutions that are better than second-order Ising machines, thus, improving the current state-of-the-art for Ising machines.

[7]  arXiv:2212.03481 (cross-list from cs.LG) [pdf, other]
Title: Bringing the Algorithms to the Data -- Secure Distributed Medical Analytics using the Personal Health Train (PHT-meDIC)
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)

The need for data privacy and security -- enforced through increasingly strict data protection regulations -- renders the use of healthcare data for machine learning difficult. In particular, the transfer of data between different hospitals is often not permissible and thus cross-site pooling of data not an option. The Personal Health Train (PHT) paradigm proposed within the GO-FAIR initiative implements an 'algorithm to the data' paradigm that ensures that distributed data can be accessed for analysis without transferring any sensitive data. We present PHT-meDIC, a productively deployed open-source implementation of the PHT concept. Containerization allows us to easily deploy even complex data analysis pipelines (e.g, genomics, image analysis) across multiple sites in a secure and scalable manner. We discuss the underlying technological concepts, security models, and governance processes. The implementation has been successfully applied to distributed analyses of large-scale data, including applications of deep neural networks to medical image data.

Replacements for Thu, 8 Dec 22

[8]  arXiv:2205.00393 (replaced) [pdf, other]
Title: Lifetime-based Optimization for Simulating Quantum Circuits on a New Sunway Supercomputer
Comments: 11 pages, 12 figures
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Quantum Physics (quant-ph)
[9]  arXiv:2208.09513 (replaced) [pdf, other]
Title: Globus Automation Services: Research process automation across the space-time continuum
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
[10]  arXiv:2210.17174 (replaced) [pdf, other]
Title: uBFT: Microsecond-scale BFT using Disaggregated Memory [Extended Version]
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[11]  arXiv:2211.08770 (replaced) [pdf, other]
Title: On some orthogonalization schemes in Tensor Train format
Authors: Olivier Coulaud (CONCACE), Luc Giraud (CONCACE), Martina Iannacito (CONCACE)
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA)
[12]  arXiv:2211.04908 (replaced) [pdf, other]
Title: Profiling and Improving the PyTorch Dataloader for high-latency Storage: A Technical Report
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
[ total of 12 entries: 1-12 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2212, contact, help  (Access key information)