We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

Abstract: The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis. The method consists in randomly "puncturing" both the data matrix $X\in\mathbb{C}^{p\times n}$ (or $\mathbb{R}^{p\times n}$) and its corresponding kernel (Gram) matrix $K$ through Bernoulli masks: $S\in\{0,1\}^{p\times n}$ for $X$ and $B\in\{0,1\}^{n\times n}$ for $K$. The resulting "two-way punctured" kernel is thus given by $K=\frac{1}{p}[(X \odot S)^{\sf H} (X \odot S)] \odot B$. We demonstrate that, for $X$ composed of independent columns drawn from a Gaussian mixture model, as $n,p\to\infty$ with $p/n\to c_0\in(0,\infty)$, the spectral behavior of $K$ -- its limiting eigenvalue distribution, as well as its isolated eigenvalues and eigenvectors -- is fully tractable and exhibits a series of counter-intuitive phenomena. We notably prove, and empirically confirm on GAN-generated image databases, that it is possible to drastically puncture the data, thereby providing possibly huge computational and storage gains, for a virtually constant (clustering of PCA) performance. This preliminary study opens as such the path towards rethinking, from a large dimensional standpoint, computational and storage costs in elementary machine learning models.
Comments: 24 pages (10 for the core paper, 14 for the proofs in supplementary materials) , 10 figures. Final version to be published in ICML 2021 proceedings
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2102.12293 [cs.LG]
  (or arXiv:2102.12293v3 [cs.LG] for this version)

Submission history

From: Florent Chatelain [view email]
[v1] Wed, 24 Feb 2021 14:01:58 GMT (1507kb,D)
[v2] Thu, 25 Feb 2021 16:16:06 GMT (1517kb,D)
[v3] Mon, 17 May 2021 06:58:23 GMT (1898kb,D)

Link back to: arXiv, form interface, contact.