On the optimality of kernels for high-dimensional clustering

Vankadara, Leena Chennuru; Ghoshdastidar, Debarghya

Full-text links:

Download:

Current browse context:

stat

< prev | next >

new | recent | 1912

Statistics > Machine Learning

Title: On the optimality of kernels for high-dimensional clustering

Authors: Leena Chennuru Vankadara, Debarghya Ghoshdastidar

(Submitted on 1 Dec 2019)

Abstract: This paper studies the optimality of kernel methods in high-dimensional data clustering. Recent works have studied the large sample performance of kernel clustering in the high-dimensional regime, where Euclidean distance becomes less informative. However, it is unknown whether popular methods, such as kernel k-means, are optimal in this regime. We consider the problem of high-dimensional Gaussian clustering and show that, with the exponential kernel function, the sufficient conditions for partial recovery of clusters using the NP-hard kernel k-means objective matches the known information-theoretic limit up to a factor of $\sqrt{2}$ for large $k$. It also exactly matches the known upper bounds for the non-kernel setting. We also show that a semi-definite relaxation of the kernel k-means procedure matches up to constant factors, the spectral threshold, below which no polynomial-time algorithm is known to succeed. This is the first work that provides such optimality guarantees for the kernel k-means as well as its convex relaxation. Our proofs demonstrate the utility of the less known polynomial concentration results for random variables with exponentially decaying tails in a higher-order analysis of kernel methods.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1912.00458 [stat.ML]
	(or arXiv:1912.00458v1 [stat.ML] for this version)

Submission history

From: Leena Chennuru Vankadara [view email]
[v1] Sun, 1 Dec 2019 18:05:49 GMT (133kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1912.00458

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: On the optimality of kernels for high-dimensional clustering

Submission history