We gratefully acknowledge support from
the Simons Foundation and member institutions.

Performance

New submissions

[ total of 3 entries: 1-3 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Thu, 30 Jun 22

[1]  arXiv:2206.14286 [pdf, ps, other]
Title: TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s
Subjects: Performance (cs.PF); Machine Learning (cs.LG)

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivated by an accurate accelerator performance model that takes into account both the memory and instruction bottlenecks. Our algorithm comes with an analytical guarantee of recall in expectation and does not require maintaining sophisticated index data structure or tuning, making it suitable for applications with frequent updates. Our work is available in the open-source package of Jax and Tensorflow on TPU.

[2]  arXiv:2206.14505 [pdf, other]
Title: Rate Lifting for Stochastic Process Algebra: Exploiting Structural Properties
Subjects: Performance (cs.PF)

This report presents an algorithm for determining the unknown rates in the sequential processes of a Stochastic Process Algebra model, provided that the rates in the combined flat model are given. Such a rate lifting is useful for model reengineering and model repair. Technically, the algorithm works by solving systems of nonlinear equations and, if necessary, adjusting the model`s synchronisation structure without changing its transition system. This report contains the complete pseudo-code of the algorithm. The approach taken by the algorithm exploits some structural properties of Stochastic Process Algebra systems, which are formulated here for the first time and could be very beneficial also in other contexts.

Cross-lists for Thu, 30 Jun 22

[3]  arXiv:2206.14761 (cross-list from cs.DC) [pdf, other]
Title: Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5
Comments: 13 pages, 18 figures, accepted by ACM/IEEE SC'22
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)

Lossy compression is one of the most efficient solutions to reduce storage overhead and improve I/O performance for HPC applications. However, existing parallel I/O libraries cannot fully utilize lossy compression to accelerate parallel write due to the lack of deep understanding on compression-write performance. To this end, we propose to deeply integrate predictive lossy compression with HDF5 to significantly improve the parallel-write performance. Specifically, we propose analytical models to predict the time of compression and parallel write before the actual compression to enable compression-write overlapping. We also introduce an extra space in the process to handle possible data overflows resulting from prediction uncertainty in compression ratios. Moreover, we propose an optimization to reorder the compression tasks to increase the overlapping efficiency. Experiments with up to 4,096 cores from Summit show that our solution improves the write performance by up to 4.5X and 2.9X over the non-compression and lossy compression solutions, respectively, with only 1.5% storage overhead (compared to original data) on two real-world HPC applications.

[ total of 3 entries: 1-3 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2207, contact, help  (Access key information)