Adapting $k$-means algorithms for outliers

Grunau, Christoph; Rozhoň, Václav

Full-text links:

Download:

Current browse context:

cs.DS

< prev | next >

new | recent | 2007

Computer Science > Data Structures and Algorithms

Title: Adapting $k$-means algorithms for outliers

Authors: Christoph Grunau, Václav Rozhoň

(Submitted on 2 Jul 2020 (v1), last revised 23 Sep 2022 (this version, v2))

Abstract: This paper shows how to adapt several simple and classical sampling-based algorithms for the $k$-means problem to the setting with outliers.
Recently, Bhaskara et al. (NeurIPS 2019) showed how to adapt the classical $k$-means++ algorithm to the setting with outliers. However, their algorithm needs to output $O(\log (k) \cdot z)$ outliers, where $z$ is the number of true outliers, to match the $O(\log k)$-approximation guarantee of $k$-means++. In this paper, we build on their ideas and show how to adapt several sequential and distributed $k$-means algorithms to the setting with outliers, but with substantially stronger theoretical guarantees: our algorithms output $(1+\varepsilon)z$ outliers while achieving an $O(1 / \varepsilon)$-approximation to the objective function. In the sequential world, we achieve this by adapting a recent algorithm of Lattanzi and Sohler (ICML 2019). In the distributed setting, we adapt a simple algorithm of Guha et al. (IEEE Trans. Know. and Data Engineering 2003) and the popular $k$-means$\|$ of Bahmani et al. (PVLDB 2012).
A theoretical application of our techniques is an algorithm with running time $\tilde{O}(nk^2/z)$ that achieves an $O(1)$-approximation to the objective function while outputting $O(z)$ outliers, assuming $k \ll z \ll n$. This is complemented with a matching lower bound of $\Omega(nk^2/z)$ for this problem in the oracle model.

Subjects:	Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Cite as:	arXiv:2007.01118 [cs.DS]
	(or arXiv:2007.01118v2 [cs.DS] for this version)

Submission history

From: Vaclav Rozhon [view email]
[v1] Thu, 2 Jul 2020 14:14:33 GMT (354kb,D)
[v2] Fri, 23 Sep 2022 14:26:00 GMT (354kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2007.01118v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Data Structures and Algorithms

Title: Adapting $k$-means algorithms for outliers

Submission history