We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Statistics Theory

Title: Taming Nonconvexity in Kernel Feature Selection -- Favorable Properties of the Laplace Kernel

Abstract: Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical properties of the \emph{global optima}, which is a mismatch, given that the gradient-based algorithms available for nonconvex optimization are only able to guarantee convergence to local minima. Studying the full landscape associated with kernel-based methods, we show that feature selection objectives using the Laplace kernel (and other $\ell_1$ kernels) come with statistical guarantees that other kernels, including the ubiquitous Gaussian kernel (or other $\ell_2$ kernels) do not possess. Based on a sharp characterization of the gradient of the objective function, we show that $\ell_1$ kernels eliminate unfavorable stationary points that appear when using an $\ell_2$ kernel. Armed with this insight, we establish statistical guarantees for $\ell_1$ kernel-based feature selection which do not require reaching the global minima. In particular, we establish model-selection consistency of $\ell_1$-kernel-based feature selection in recovering main effects and hierarchical interactions in the nonparametric setting with $n \sim \log p$ samples.
Comments: 26 pages main text; 74 pages total; appendix rewritten (typo fixed; proof structure reorganized)
Subjects: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
Cite as: arXiv:2106.09387 [math.ST]
  (or arXiv:2106.09387v3 [math.ST] for this version)

Submission history

From: Feng Ruan [view email]
[v1] Thu, 17 Jun 2021 11:05:48 GMT (1466kb,D)
[v2] Tue, 29 Jun 2021 05:15:48 GMT (1471kb,D)
[v3] Wed, 25 May 2022 12:46:04 GMT (304kb,D)

Link back to: arXiv, form interface, contact.