Lazy stochastic principal component analysis

Wojnowicz, Michael; Nguyen, Dinh; Li, Li; Zhao, Xuan

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1709

Statistics > Machine Learning

Title: Lazy stochastic principal component analysis

Authors: Michael Wojnowicz, Dinh Nguyen, Li Li, Xuan Zhao

(Submitted on 21 Sep 2017)

Abstract: Stochastic principal component analysis (SPCA) has become a popular dimensionality reduction strategy for large, high-dimensional datasets. We derive a simplified algorithm, called Lazy SPCA, which has reduced computational complexity and is better suited for large-scale distributed computation. We prove that SPCA and Lazy SPCA find the same approximations to the principal subspace, and that the pairwise distances between samples in the lower-dimensional space is invariant to whether SPCA is executed lazily or not. Empirical studies find downstream predictive performance to be identical for both methods, and superior to random projections, across a range of predictive models (linear regression, logistic lasso, and random forests). In our largest experiment with 4.6 million samples, Lazy SPCA reduced 43.7 hours of computation to 9.9 hours. Overall, Lazy SPCA relies exclusively on matrix multiplications, besides an operation on a small square matrix whose size depends only on the target dimensionality.

Comments:	To be published in: 2017 IEEE International Conference on Data Mining Workshops (ICDMW)
Subjects:	Machine Learning (stat.ML)
Cite as:	arXiv:1709.07175 [stat.ML]
	(or arXiv:1709.07175v1 [stat.ML] for this version)

Submission history

From: Michael Wojnowicz [view email]
[v1] Thu, 21 Sep 2017 06:43:49 GMT (3197kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1709.07175

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Lazy stochastic principal component analysis

Submission history