We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Quantitative Biology > Genomics

Title: Fast sequence to graph alignment using the graph wavefront algorithm

Abstract: Motivation: A pan-genome graph represents a collection of genomes and encodes sequence variations between them. It is a powerful data structure for studying multiple similar genomes. Sequence-to-graph alignment is an essential step for the construction and the analysis of pan-genome graphs. However, existing algorithms incur runtime proportional to the product of sequence length and graph size, making them inefficient for aligning long sequences against large graphs. Results: We propose the graph wavefront alignment algorithm (Gwfa), a new method for aligning a sequence to a sequence graph. Although the worst-case time complexity of Gwfa is the same as the existing algorithms, it is designed to run faster for closely matching sequences, and its runtime in practice often increases only moderately with the edit distance of the optimal alignment. On four real datasets, Gwfa is up to four orders of magnitude faster than other exact sequence-to-graph alignment algorithms. We also propose a graph pruning heuristic on top of Gwfa, which can achieve an additional $\sim$10-fold speedup on large graphs. Availability: Gwfa code is accessible at this https URL
Subjects: Genomics (q-bio.GN); Quantitative Methods (q-bio.QM)
Cite as: arXiv:2206.13574 [q-bio.GN]
  (or arXiv:2206.13574v1 [q-bio.GN] for this version)

Submission history

From: Haowen Zhang [view email]
[v1] Mon, 27 Jun 2022 18:20:44 GMT (377kb)

Link back to: arXiv, form interface, contact.