We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Evaluating the Impact of SDC on the GMRES Iterative Solver

Abstract: Increasing parallelism and transistor density, along with increasingly tighter energy and peak power constraints, may force exposure of occasionally incorrect computation or storage to application codes. Silent data corruption (SDC) will likely be infrequent, yet one SDC suffices to make numerical algorithms like iterative linear solvers cease progress towards the correct answer. Thus, we focus on resilience of the iterative linear solver GMRES to a single transient SDC. We derive inexpensive checks to detect the effects of an SDC in GMRES that work for a more general SDC model than presuming a bit flip. Our experiments show that when GMRES is used as the inner solver of an inner-outer iteration, it can "run through" SDC of almost any magnitude in the computationally intensive orthogonalization phase. That is, it gets the right answer using faulty data without any required roll back. Those SDCs which it cannot run through, get caught by our detection scheme.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
DOI: 10.1109/IPDPS.2014.123
Cite as: arXiv:1311.6505 [cs.DC]
  (or arXiv:1311.6505v1 [cs.DC] for this version)

Submission history

From: James Elliott [view email]
[v1] Mon, 25 Nov 2013 22:19:39 GMT (1189kb,D)

Link back to: arXiv, form interface, contact.