References & Citations
Computer Science > Data Structures and Algorithms
Title: SOPanG 2: online searching over a pan-genome without false positives
(Submitted on 6 Apr 2020 (v1), last revised 15 Apr 2020 (this version, v2))
Abstract: Motivation: The pan-genome can be stored as elastic-degenerate (ED) string, a recently introduced compact representation of multiple overlapping sequences. However, a search over the ED string does not indicate which individuals (if any) match the entire query.
Results: We augment the ED string with sources (individuals' indexes) and propose an extension of the SOPanG (Shift-Or for Pan-Genome) tool to report only true positive matches, omitting those not occurring in any of the haplotypes. The additional stage for checking the matches yields a penalty of less than 3.5% relative speed in practice, which means that SOPanG 2 is able to report pattern matches in a pan-genome, mapping them onto individuals, at the single-thread throughput of above 430 MB/s on real data.
Availability and implementation: SOPanG 2 can be downloaded here: github.com/MrAlexSee/sopang
Submission history
From: Aleksander Cisłak [view email][v1] Mon, 6 Apr 2020 22:54:58 GMT (29kb,D)
[v2] Wed, 15 Apr 2020 17:26:17 GMT (29kb,D)
Link back to: arXiv, form interface, contact.