We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Statistics Theory

Title: Convergence of Graph Laplacian with kNN Self-tuned Kernels

Abstract: Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {\sigma^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $\sigma$, and a common practice called self-tuned kernel adaptively sets a $\sigma_i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$'s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(\alpha)}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ \epsilon \hat{\rho}(x_i) \hat{\rho}(x_j)})/\hat{\rho}(x_i)^\alpha \hat{\rho}(x_j)^\alpha$, where $\hat{\rho}$ is the estimated bandwidth function {by kNN}, and the limiting operator is also parametrized by $\alpha$. When $\alpha = 1$, the limiting operator is the weighted manifold Laplacian $\Delta_p$. Specifically, we prove the point-wise convergence of $L_N f $ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$ consistency for $\hat{\rho}$ which bounds the relative estimation error $|\hat{\rho} - \bar{\rho}|/\bar{\rho}$ uniformly with high probability, where $\bar{\rho} = p^{-1/d}$, and $p$ is the data density function. Our theoretical results reveal the advantage of self-tuned kernel over fixed-bandwidth kernel via smaller variance error in low-density regions. In the algorithm, no prior knowledge of $d$ or data density is needed. The theoretical results are supported by numerical experiments on simulated data and hand-written digit image data.
Subjects: Statistics Theory (math.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2011.01479 [math.ST]
  (or arXiv:2011.01479v2 [math.ST] for this version)

Submission history

From: Xiuyuan Cheng [view email]
[v1] Tue, 3 Nov 2020 04:55:33 GMT (2777kb,D)
[v2] Wed, 10 Feb 2021 02:17:46 GMT (2077kb,D)

Link back to: arXiv, form interface, contact.