We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Optimization and Control

Title: Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Abstract: Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k=5 covariates from p=300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours for $p=1,000$s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical datasets, we illustrate our approach's ability to derive interpretable principal components tractably at scale.
Comments: Revision submitted to JMLR
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Statistics Theory (math.ST); Computation (stat.CO)
Journal reference: Journal of Machine Learning Research 23(13):1-35, 2022
Cite as: arXiv:2005.05195 [math.OC]
  (or arXiv:2005.05195v4 [math.OC] for this version)

Submission history

From: Ryan Cory-Wright [view email]
[v1] Mon, 11 May 2020 15:39:23 GMT (55kb)
[v2] Wed, 21 Oct 2020 21:11:28 GMT (65kb)
[v3] Wed, 28 Apr 2021 21:04:39 GMT (109kb,D)
[v4] Wed, 25 Aug 2021 15:42:09 GMT (110kb,D)

Link back to: arXiv, form interface, contact.