We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Data Structures and Algorithms

Title: Optimal Sequence Length Requirements for Phylogenetic Tree Reconstruction with Indels

Abstract: We consider the phylogenetic tree reconstruction problem with insertions and deletions (indels). Phylogenetic algorithms proceed under a model where sequences evolve down the model tree, and given sequences at the leaves, the problem is to reconstruct the model tree with high probability. Traditionally, sequences mutate by substitution-only processes, although some recent work considers evolutionary processes with insertions and deletions. In this paper, we improve on previous work by giving a reconstruction algorithm that simultaneously has $O(\text{poly} \log n)$ sequence length and tolerates constant indel probabilities on each edge. Our recursively-reconstructed distance-based technique provably outputs the model tree when the model tree has $O(\text{poly} \log n)$ diameter and discretized branch lengths, allowing for the probability of insertion and deletion to be non-uniform and asymmetric on each edge. Our polylogarithmic sequence length bounds improve significantly over previous polynomial sequence length bounds and match sequence length bounds in the substitution-only models of phylogenetic evolution, thereby challenging the idea that many global misalignments caused by insertions and deletions when $p_{indel}$ is large are a fundamental obstruction to reconstruction with short sequences.
Comments: Update: Many minor edits to improve clarity and presentation as suggested by STOC reviewers. The results and overall structure of the paper are unaffected. To appear in STOC 2019
Subjects: Data Structures and Algorithms (cs.DS); Probability (math.PR); Quantitative Methods (q-bio.QM)
Cite as: arXiv:1811.01121 [cs.DS]
  (or arXiv:1811.01121v3 [cs.DS] for this version)

Submission history

From: Arun Ganesh [view email]
[v1] Fri, 2 Nov 2018 23:17:17 GMT (62kb,D)
[v2] Tue, 15 Jan 2019 21:14:10 GMT (63kb,D)
[v3] Wed, 20 Feb 2019 23:08:56 GMT (134kb,D)

Link back to: arXiv, form interface, contact.