Current browse context:
cs.DC
Change to browse by:
References & Citations
Computer Science > Distributed, Parallel, and Cluster Computing
Title: Randomized algorithms for distributed computation of principal component analysis and singular value decomposition
(Submitted on 27 Dec 2016 (v1), revised 31 May 2017 (this version, v3), latest version 1 Jan 2018 (v4))
Abstract: As illustrated via numerical experiments with an implementation in Spark (the popular platform for distributed computation), randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a full principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark; for instance, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.
Submission history
From: Mark Tygert [view email][v1] Tue, 27 Dec 2016 19:06:13 GMT (13kb)
[v2] Sat, 31 Dec 2016 22:06:19 GMT (13kb)
[v3] Wed, 31 May 2017 23:04:43 GMT (29kb)
[v4] Mon, 1 Jan 2018 20:24:15 GMT (41kb,D)
Link back to: arXiv, form interface, contact.