We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Randomized algorithms for distributed computation of principal component analysis and singular value decomposition

Abstract: As illustrated via numerical experiments with an implementation in Spark (the popular platform for distributed computation), randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a full principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark; for instance, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.
Comments: 17 pages, 21 tables, 6 algorithms in pseudocode
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Numerical Analysis (math.NA); Computation (stat.CO)
Cite as: arXiv:1612.08709 [cs.DC]
  (or arXiv:1612.08709v3 [cs.DC] for this version)

Submission history

From: Mark Tygert [view email]
[v1] Tue, 27 Dec 2016 19:06:13 GMT (13kb)
[v2] Sat, 31 Dec 2016 22:06:19 GMT (13kb)
[v3] Wed, 31 May 2017 23:04:43 GMT (29kb)
[v4] Mon, 1 Jan 2018 20:24:15 GMT (41kb,D)

Link back to: arXiv, form interface, contact.