We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: pylspack: Parallel algorithms and data structures for sketching, column subset selection, regression and leverage scores

Abstract: We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian and CountSketch random projections and their combination, (ii) computation of the Gram matrix and (iii) computation of the squared row norms of the product of two matrices, with a special focus on "tall-and-skinny" matrices, which arise in many applications. We provide a detailed analysis of the ubiquitous CountSketch transform and its combination with Gaussian random projections, accounting for memory requirements, computational complexity and workload balancing. We also demonstrate how these results can be applied to column subset selection, least squares regression and leverage scores computation. These tools have been implemented in pylspack, a publicly available Python package (this https URL) whose core is written in C++ and parallelized with OpenMP, and which is compatible with standard matrix data structures of SciPy and NumPy. Extensive numerical experiments indicate that the proposed algorithms scale well and significantly outperform existing libraries for tall-and-skinny matrices.
Comments: To appear in ACM TOMS
Subjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS)
MSC classes: 65F50, 65F08, 65F20, 68W10, 68W20, 68P05
ACM classes: F.2.1; G.3; E.1
DOI: 10.1145/3555370
Cite as: arXiv:2203.02798 [cs.DS]
  (or arXiv:2203.02798v2 [cs.DS] for this version)

Submission history

From: Aleksandros Sobczyk [view email]
[v1] Sat, 5 Mar 2022 18:21:05 GMT (132kb,D)
[v2] Thu, 4 Aug 2022 22:09:46 GMT (132kb,D)

Link back to: arXiv, form interface, contact.