We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Databases

Title: Improving Prediction-Based Lossy Compression Dramatically Via Ratio-Quality Modeling

Abstract: Error-bounded lossy compression is one of the most effective techniques for scientific data reduction. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of the lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained error-bound tuning of various data partitions. We evaluate our analytical model on 10 scientific datasets, demonstrating its high accuracy (93.47% accuracy on average) and low computational cost (up to 18.7X lower than the trial-and-error approach) for estimating the compression ratio and the impact of lossy compression on post-hoc analysis quality. We also verified the high efficiency of our ratio-quality model using different applications across the three use-cases. In addition, the experiment demonstrates that our modeling based approach reduces the time to store the 3D Reverse Time Migration data by up to 3.4X over the traditional solution using 128 CPU cores from 8 compute nodes.
Comments: 14 pages, 14 figures, submitted to ICDE 2022
Subjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2111.09815 [cs.DB]
  (or arXiv:2111.09815v2 [cs.DB] for this version)

Submission history

From: Dingwen Tao [view email]
[v1] Thu, 18 Nov 2021 17:29:42 GMT (2485kb,D)
[v2] Sun, 28 Nov 2021 11:43:56 GMT (2486kb,D)

Link back to: arXiv, form interface, contact.