Current browse context:
q-bio.QM
Change to browse by:
References & Citations
Quantitative Biology > Quantitative Methods
Title: File-based localization of numerical perturbations in data analysis pipelines
(Submitted on 3 Jun 2020 (v1), last revised 29 Sep 2020 (this version, v2))
Abstract: Data analysis pipelines are known to be impacted by computational conditions, presumably due to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.
Submission history
From: Ali Salari [view email][v1] Wed, 3 Jun 2020 19:11:40 GMT (770kb,D)
[v2] Tue, 29 Sep 2020 01:00:09 GMT (862kb)
Link back to: arXiv, form interface, contact.