We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.AP

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Applications

Title: Assignment of endogenous retrovirus integration sites using a mixture model

Abstract: Structural variation occurs in the genomes of individuals because of the different positions occupied by repetitive genome elements like endogenous retroviruses, or ERVs. The presence or absence of ERVs can be determined by identifying the junction with the host genome using high-throughput sequence technology and a clustering algorithm. The resulting data give the number of sequence reads assigned to each ERV-host junction sequence for each sampled individual. Variability in the number of reads from an individual integration site makes it difficult to determine whether a site is present for low read counts. We present a novel two-component mixture of negative binomial distributions to model these counts and assign a probability that a given ERV is present in a given individual. We explain how our approach is superior to existing alternatives, including another form of two-component mixture model and the much more common approach of selecting a threshold count for declaring the presence of an ERV. We apply our method to a data set of ERV integrations in mule deer [Odocoileus hemionus], a species for which no genomic resources are available, and demonstrate that the discovered patterns of shared integration sites contain information about animal relatedness.
Subjects: Applications (stat.AP)
Cite as: arXiv:1510.00028 [stat.AP]
  (or arXiv:1510.00028v2 [stat.AP] for this version)

Submission history

From: Le Bao [view email]
[v1] Wed, 30 Sep 2015 20:32:10 GMT (263kb,D)
[v2] Tue, 3 Jan 2017 19:46:13 GMT (519kb,D)

Link back to: arXiv, form interface, contact.