We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Gaussian Process-Based Bayesian Nonparametric Inference of Population Trajectories from Gene Genealogies

Abstract: Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state of the art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.
Comments: 25 pages with 7 figures, revised version; added total variation metric to compare methods; influenza example updated with new GP results, added discussion and implementation of alternative GP kernels (OU and sparse approximation of the integrated Brownian motion)
Subjects: Methodology (stat.ME); Populations and Evolution (q-bio.PE)
Cite as: arXiv:1112.4138 [stat.ME]
  (or arXiv:1112.4138v2 [stat.ME] for this version)

Submission history

From: Vladimir Minin [view email]
[v1] Sun, 18 Dec 2011 09:31:31 GMT (274kb,D)
[v2] Sun, 14 Oct 2012 22:10:57 GMT (241kb,D)

Link back to: arXiv, form interface, contact.