We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Quantitative Biology > Quantitative Methods

Title: Score distributions of gapped multiple sequence alignments down to the low-probability tail

Abstract: Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to the for practical applications in Molecular Biology much more relevant case of multiple sequence alignments with gaps. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10^-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Subjects: Quantitative Methods (q-bio.QM); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Journal reference: Phys. Rev. E 94, 022127 (2016)
DOI: 10.1103/PhysRevE.94.022127
Cite as: arXiv:1512.04982 [q-bio.QM]
  (or arXiv:1512.04982v1 [q-bio.QM] for this version)

Submission history

From: Pascal Fieth [view email]
[v1] Mon, 7 Dec 2015 16:22:15 GMT (78kb)

Link back to: arXiv, form interface, contact.