We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: SOPanG 2: online searching over a pan-genome without false positives

Abstract: Motivation: The pan-genome can be stored as elastic-degenerate (ED) string, a recently introduced compact representation of multiple overlapping sequences. However, a search over the ED string does not indicate which individuals (if any) match the entire query.
Results: We augment the ED string with sources (individuals' indexes) and propose an extension of the SOPanG (Shift-Or for Pan-Genome) tool to report only true positive matches, omitting those not occurring in any of the haplotypes. The additional stage for checking the matches yields a penalty of less than 3.5% relative speed in practice, which means that SOPanG 2 is able to report pattern matches in a pan-genome, mapping them onto individuals, at the single-thread throughput of above 430 MB/s on real data.
Availability and implementation: SOPanG 2 can be downloaded here: github.com/MrAlexSee/sopang
Subjects: Data Structures and Algorithms (cs.DS)
MSC classes: 68W32
ACM classes: F.2.2
Cite as: arXiv:2004.03033 [cs.DS]
  (or arXiv:2004.03033v2 [cs.DS] for this version)

Submission history

From: Aleksander Cisłak [view email]
[v1] Mon, 6 Apr 2020 22:54:58 GMT (29kb,D)
[v2] Wed, 15 Apr 2020 17:26:17 GMT (29kb,D)

Link back to: arXiv, form interface, contact.