We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data

Abstract: We present a new non-negative matrix factorization model for $(0,1)$ bounded-support data based on the doubly non-central beta (DNCB) distribution, a generalization of the beta distribution. The expressiveness of the DNCB distribution is particularly useful for modeling DNA methylation datasets, which are typically highly dispersed and multi-modal; however, the model structure is sufficiently general that it can be adapted to many other domains where latent representations of $(0,1)$ bounded-support data are of interest. Although the DNCB distribution lacks a closed-form conjugate prior, several augmentations let us derive an efficient posterior inference algorithm composed entirely of analytic updates. Our model improves out-of-sample predictive performance on both real and synthetic DNA methylation datasets over state-of-the-art methods in bioinformatics. In addition, our model yields meaningful latent representations that accord with existing biological knowledge.
Comments: To appear in the Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP)
Cite as: arXiv:2106.06691 [stat.ML]
  (or arXiv:2106.06691v1 [stat.ML] for this version)

Submission history

From: Aaron Schein [view email]
[v1] Sat, 12 Jun 2021 05:36:27 GMT (5340kb,D)

Link back to: arXiv, form interface, contact.