We gratefully acknowledge support from
the Simons Foundation and member institutions.

Distributed, Parallel, and Cluster Computing

New submissions

[ total of 4 entries: 1-4 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Fri, 15 Oct 21

[1]  arXiv:2110.07029 [pdf, other]
Title: Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)

Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU heterogeneity combine to limit accuracy and increase the time to convergence. We address these challenges with Adaptive SGD, an adaptive elastic model averaging stochastic gradient descent algorithm for heterogeneous multi-GPUs that is characterized by dynamic scheduling, adaptive batch size scaling, and normalized model merging. Instead of statically partitioning batches to GPUs, batches are routed based on the relative processing speed. Batch size scaling assigns larger batches to the faster GPUs and smaller batches to the slower ones, with the goal to arrive at a steady state in which all the GPUs perform the same number of model updates. Normalized model merging computes optimal weights for every GPU based on the assigned batches such that the combined model achieves better accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy and is scalable with the number of GPUs.

Cross-lists for Fri, 15 Oct 21

[2]  arXiv:2110.06991 (cross-list from cs.LG) [pdf, other]
Title: Scalable Graph Embedding LearningOn A Single GPU
Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)

Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can benefit a variety of machine learning tasks. With the current scale of real-world applications, most graph analytic methods suffer high computation and space costs. These methods and systems can process a network with thousands to a few million nodes. However, scaling to large-scale networks remains a challenge. The complexity of training graph embedding system requires the use of existing accelerators such as GPU. In this paper, we introduce a hybrid CPU-GPU framework that addresses the challenges of learning embedding of large-scale graphs. The performance of our method is compared qualitatively and quantitatively with the existing embedding systems on common benchmarks. We also show that our system can scale training to datasets with an order of magnitude greater than a single machine's total memory capacity. The effectiveness of the learned embedding is evaluated within multiple downstream applications. The experimental results indicate the effectiveness of the learned embedding in terms of performance and accuracy.

[3]  arXiv:2110.07083 (cross-list from cs.CY) [pdf, other]
Title: Dynamic Conflict Resolution of IoT Services in Smart Homes
Comments: 15 pages, 5 figures, accepted and to be published in the proceedings of 19th International Conference on Service Oriented Computing (ICSOC 2021)
Subjects: Computers and Society (cs.CY); Distributed, Parallel, and Cluster Computing (cs.DC)

We propose a novel conflict resolution framework for IoT services in multi-resident smart homes. The proposed framework employs a preference extraction model based on a temporal proximity strategy. We design a preference aggregation model using a matrix factorization-based approach (i.e., singular value decomposition). The concepts of current resident item matrix and ideal resident item matrix are introduced as key criteria to cater to the conflict resolution framework. Finally, a set of experiments on real-world datasets are conducted to show the effectiveness of the proposed approach.

Replacements for Fri, 15 Oct 21

[4]  arXiv:2106.16064 (replaced) [pdf, other]
Title: Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
[ total of 4 entries: 1-4 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2110, contact, help  (Access key information)