We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: HiCOPS: High Performance Computing Framework for Tera-Scale Database Search of Mass Spectrometry based Omics Data

Abstract: Database-search algorithms, that deduce peptides from Mass Spectrometry (MS) data, have tried to improve the computational efficiency to accomplish larger, and more complex systems biology studies. Existing serial, and high-performance computing (HPC) search engines, otherwise highly successful, are known to exhibit poor-scalability with increasing size of theoretical search-space needed for increased complexity of modern non-model, multi-species MS-based omics analysis. Consequently, the bottleneck for computational techniques is the communication costs of moving the data between hierarchy of memory, or processing units, and not the arithmetic operations. This post-Moore change in architecture, and demands of modern systems biology experiments have dampened the overall effectiveness of the existing HPC workflows. We present a novel efficient parallel computational method, and its implementation on memory-distributed architectures for peptide identification tool called HiCOPS, that enables more than 100-fold improvement in speed over most existing HPC proteome database search tools. HiCOPS empowers the supercomputing database search concept for comprehensive identification of peptides, and all their modified forms within a reasonable time-frame. We demonstrate this by searching Gigabytes of experimental MS data against Terabytes of databases where HiCOPS completes peptide identification in few minutes using 72 parallel nodes (1728 cores) compared to several weeks required by existing state-of-the-art tools using 1 node (24 cores); 100 minutes vs 5 weeks; 500x speedup. Finally, we formulate a theoretical framework for our overhead-avoiding strategy, and report superior performance evaluation results for key metrics including execution time, CPU utilization, speedups, and I/O efficiency. The software will be made available at: hicops.github.io
Comments: Under peer review. 37 pages, 9 figures, 5 tables
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Molecular Networks (q-bio.MN)
Cite as: arXiv:2102.02286 [cs.DC]
  (or arXiv:2102.02286v2 [cs.DC] for this version)

Submission history

From: Fahad Saeed [view email]
[v1] Wed, 3 Feb 2021 20:48:09 GMT (22126kb,D)
[v2] Fri, 5 Feb 2021 16:54:25 GMT (22126kb,D)

Link back to: arXiv, form interface, contact.