References & Citations
Quantitative Biology > Genomics
Title: Comprehensive assessment of error correction methods for high-throughput sequencing data
(Submitted on 10 Jul 2020 (v1), last revised 25 Mar 2021 (this version, v2))
Abstract: The advent of DNA and RNA sequencing has revolutionized the study of genomics and molecular biology. Next generation sequencing (NGS) technologies like Illumina, Ion Torrent, SOLiD sequencing etc. have brought about a quick and cheap way to sequence genomes. Recently, third generation sequencing (TGS) technologies like PacBio and Oxford Nanopore Technology (ONT) have also been developed. Different technologies use different underlying methods for sequencing and are prone to different error rates. Though many tools exist for error correction of sequencing data from NGS and TGS methods, no standard method is available yet to evaluate the accuracy and effectiveness of these error-correction tools. In this study, we present a Software Package for Error Correction Tool Assessment on nuCLEic acid sequences (SPECTACLE) providing comprehensive algorithms to evaluate error-correction methods for DNA and RNA sequencing, for NGS and TGS platforms. We also present a compilation of sequencing datasets for Illumina, PacBio and ONT platforms that present challenging scenarios for error-correction tools. Using these datasets and SPECTACLE, we evaluate the performance of 23 different error-correction tools and present unique and helpful insights into their strengths and weaknesses. We hope that our methodology will standardize the evaluation of DNA and RNA error-correction tools in the future.
Submission history
From: Anand Ramachandran [view email][v1] Fri, 10 Jul 2020 00:41:38 GMT (751kb)
[v2] Thu, 25 Mar 2021 15:49:36 GMT (680kb)
Link back to: arXiv, form interface, contact.