We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.AP

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Applications

Title: MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification

Abstract: Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structure, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate quantification of RNA isoforms from RNA-seq data is a challenging computational task due to the information loss in sequencing experiments. Recent accumulation of multiple RNA-seq data sets from the same biological condition provides new opportunities to improve the isoform quantification accuracy. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples in estimating isoform abundance. These methods ignore the possible heterogeneity in the quality and noise levels of different samples, and could have biased and unrobust estimates. In this article, we develop a method named "joint modeling of multiple RNA-seq samples for accurate isoform quantification" (MSIQ) for more accurate and robust isoform quantification, by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify the informative group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples with more weights on the informative group. We show that MSIQ provides a consistent estimator of isoform abundance, and demonstrate the accuracy and effectiveness of MSIQ compared to alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data of human embryonic stem cells and brain tissues.
Subjects: Applications (stat.AP); Genomics (q-bio.GN); Quantitative Methods (q-bio.QM)
MSC classes: 97K80, 47N30
Cite as: arXiv:1603.05915 [stat.AP]
  (or arXiv:1603.05915v1 [stat.AP] for this version)

Submission history

From: Jingyi Jessica Li [view email]
[v1] Fri, 18 Mar 2016 16:48:43 GMT (4067kb,D)
[v2] Mon, 21 Aug 2017 22:49:59 GMT (5304kb,D)
[v3] Sat, 2 Dec 2017 23:25:02 GMT (5054kb,D)

Link back to: arXiv, form interface, contact.