We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: An Online Hierarchical Algorithm for Extreme Clustering

Abstract: Many modern clustering methods scale well to a large number of data items, N, but not to a large number of clusters, K. This paper introduces PERCH, a new non-greedy algorithm for online hierarchical clustering that scales to both massive N and K--a problem setting we term extreme clustering. Our algorithm efficiently routes new data points to the leaves of an incrementally-built tree. Motivated by the desire for both accuracy and speed, our approach performs tree rotations for the sake of enhancing subtree purity and encouraging balancedness. We prove that, under a natural separability assumption, our non-greedy algorithm will produce trees with perfect dendrogram purity regardless of online data arrival order. Our experiments demonstrate that PERCH constructs more accurate trees than other tree-building clustering algorithms and scales well with both N and K, achieving a higher quality clustering than the strongest flat clustering competitor in nearly half the time.
Comments: 20 pages. Code available here: this https URL
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:1704.01858 [cs.LG]
  (or arXiv:1704.01858v1 [cs.LG] for this version)

Submission history

From: Nicholas Monath [view email]
[v1] Thu, 6 Apr 2017 14:29:10 GMT (338kb,D)

Link back to: arXiv, form interface, contact.