We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: Random Projections for k-Means: Maintaining Coresets Beyond Merge & Reduce

Abstract: We give a new construction for a small space summary satisfying the coreset guarantee of a data set with respect to the $k$-means objective function. The number of points required in an offline construction is in $\tilde{O}(k \epsilon^{-2}\min(d,k\epsilon^{-2}))$ which is minimal among all available constructions.
Aside from two constructions with exponential dependence on the dimension, all known coresets are maintained in data streams via the merge and reduce framework, which incurs are large space dependency on $\log n$. Instead, our construction crucially relies on Johnson-Lindenstrauss type embeddings which combined with results from online algorithms give us a new technique for efficiently maintaining coresets in data streams without relying on merge and reduce. The final number of points stored by our algorithm in a data stream is in $\tilde{O}(k^2 \epsilon^{-2} \log^2 n \min(d,k\epsilon^{-2}))$.
Comments: This paper has been withdrawn due to an error in Theorem 1
Subjects: Data Structures and Algorithms (cs.DS)
Cite as: arXiv:1504.01584 [cs.DS]
  (or arXiv:1504.01584v5 [cs.DS] for this version)

Submission history

From: Chris Schwiegelshohn [view email]
[v1] Tue, 7 Apr 2015 12:45:29 GMT (14kb)
[v2] Wed, 8 Apr 2015 11:17:08 GMT (17kb)
[v3] Fri, 24 Apr 2015 08:04:32 GMT (17kb)
[v4] Thu, 9 Jul 2015 14:20:48 GMT (0kb,I)
[v5] Tue, 18 Feb 2020 16:44:34 GMT (0kb,I)

Link back to: arXiv, form interface, contact.