We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: Sparse Coresets for SVD on Infinite Streams

Abstract: In streaming Singular Value Decomposition (SVD), $d$-dimensional rows of a possibly infinite matrix arrive sequentially as points in $\mathbb{R}^d$. An $\epsilon$-coreset is a (much smaller) matrix whose sum of square distances of the rows to any hyperplane approximates that of the original matrix to a $1 \pm \epsilon$ factor. Our main result is that we can maintain a $\epsilon$-coreset while storing only $O(d \log^2 d / \epsilon^2)$ rows. Known lower bounds of $\Omega(d / \epsilon^2)$ rows show that this is nearly optimal. Moreover, each row of our coreset is a weighted subset of the input rows. This is highly desirable since it: (1) preserves sparsity; (2) is easily interpretable; (3) avoids precision errors; (4) applies to problems with constraints on the input. Previous streaming results for SVD that return a subset of the input required storing $\Omega(d \log^3 n / \epsilon^2)$ rows where $n$ is the number of rows seen so far. Our algorithm, with storage independent of $n$, is the first result that uses finite memory on infinite streams. We support our findings with experiments on the Wikipedia dataset benchmarked against state-of-the-art algorithms.
Subjects: Data Structures and Algorithms (cs.DS)
Cite as: arXiv:2002.06296 [cs.DS]
  (or arXiv:2002.06296v3 [cs.DS] for this version)

Submission history

From: Adiel Statman [view email]
[v1] Sat, 15 Feb 2020 01:29:40 GMT (344kb,D)
[v2] Mon, 23 Nov 2020 17:52:23 GMT (948kb,D)
[v3] Thu, 26 Nov 2020 18:48:58 GMT (470kb,D)

Link back to: arXiv, form interface, contact.