We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.AP

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Applications

Title: Variance component score test for time-course gene set analysis of longitudinal RNA-seq data

Abstract: As gene expression measurement technology is shifting from microarrays to sequencing, the statistical tools available for their analysis must be adapted since RNA-seq data are measured as counts. Recently, it has been proposed to tackle the count nature of these data by modeling log-count reads per million as continuous variables, using nonparametric regression to account for their inherent heteroscedasticity. Adopting such a framework, we propose tcgsaseq, a principled, model-free and efficient top-down method for detecting longitudinal changes in RNA-seq gene sets. Considering gene sets defined a priori, tcgsaseq identifies those whose expression vary over time, based on an original variance component score test accounting for both covariates and heteroscedasticity without assuming any specific parametric distribution for the transformed counts. We demonstrate that despite the presence of a nonparametric component, our test statistic has a simple form and limiting distribution, and both may be computed quickly. A permutation version of the test is additionally proposed for very small sample sizes. Applied to both simulated data and two real datasets, the proposed method is shown to exhibit very good statistical properties, with an increase in stability and power when compared to state of the art methods ROAST, edgeR and DESeq2, which can fail to control the type I error under certain realistic settings. We have made the method available for the community in the R package tcgsaseq.
Comments: 24 pages, 5 figures, typo corrections and refer to "DESeq2-min test" instead of just "DESeq2"
Subjects: Applications (stat.AP); Genomics (q-bio.GN); Methodology (stat.ME)
MSC classes: 62P10
Cite as: arXiv:1605.02351 [stat.AP]
  (or arXiv:1605.02351v3 [stat.AP] for this version)

Submission history

From: Boris Hejblum [view email]
[v1] Sun, 8 May 2016 19:21:43 GMT (42kb,D)
[v2] Wed, 29 Jun 2016 20:01:35 GMT (52kb,D)
[v3] Fri, 1 Jul 2016 04:20:31 GMT (52kb,D)
[v4] Thu, 5 Jan 2017 11:48:37 GMT (52kb,D)
[v5] Fri, 6 Jan 2017 14:47:20 GMT (51kb,D)

Link back to: arXiv, form interface, contact.