We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Methodology

Title: Hamiltonian zigzag speeds up large-scale learning of direct effects among mixed-type biological traits

Abstract: Inferring correlation among mixed-type biological traits while controlling for the evolutionary relationship among taxa is of great scientific interest yet remains computationally challenging. The recently developed phylogenetic multivariate probit model accommodates binary and continuous traits by assuming latent parameters underlying binary traits. The most expensive inference step is to sample the latent parameters from their conditional posterior that is a high dimensional truncated normal. The current best approach uses the bouncy particle sampler (BPS) optimized with a linear-order gradient evaluation method that employs a dynamic programming strategy on the directed acyclic structure of the phylogeny. Despite its significant improvement upon previous methods, with increasing sample sizes BPS encounters difficulty in exploring the parameter space and fails to provide reliable estimates for the across-trait partial correlation that describes the direct effects among traits. We develop a new inference scheme that highlights Zigzag Hamiltonian Monte Carlo (Zigzag-HMC), a variant of traditional HMC that uses Laplace momentum. Zigzag-HMC can utilize the same gradient evaluation method that speeds up BPS, yet it is much more efficient. We further improve the efficiency by jointly updating the latent parameters and correlation elements using a differential operator splitting technique. In an application exploring HIV-1 evolution that requires joint sampling from a 11,235-dimensional truncated normal and a 24-dimensional covariance matrix, our method yields a $ 5\times $ speedup compared to BPS and makes it possible to estimate the direct effects among important viral mutations and virulence. We also extend the phylogenetic probit model to categorical traits for broader applicability, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.
Comments: 24 pages, 4 figures, 3 tables
Subjects: Methodology (stat.ME); Populations and Evolution (q-bio.PE); Computation (stat.CO)
Cite as: arXiv:2201.07291 [stat.ME]
  (or arXiv:2201.07291v1 [stat.ME] for this version)

Submission history

From: Zhenyu Zhang [view email]
[v1] Tue, 18 Jan 2022 20:07:00 GMT (408kb,D)
[v2] Mon, 7 Mar 2022 21:10:26 GMT (1645kb,D)
[v3] Wed, 7 Sep 2022 22:13:25 GMT (2110kb,D)

Link back to: arXiv, form interface, contact.