We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: AI Giving Back to Statistics? Discovery of the Coordinate System of Univariate Distributions by Beta Variational Autoencoder

Abstract: Distributions are fundamental statistical elements that play essential theoretical and practical roles. The article discusses experiences of training neural networks to classify univariate empirical distributions and to represent them on the two-dimensional latent space forcing disentanglement based on the inputs of cumulative distribution functions (CDF). The latent space representation has been performed using an unsupervised beta variational autoencoder (beta-VAE). It separates distributions of different shapes while overlapping similar ones and empirically realises relationships between distributions that are known theoretically. The synthetic experiment of generated univariate continuous and discrete (Bernoulli) distributions with varying sample sizes and parameters has been performed to support the study. The representation on the latent two-dimensional coordinate system can be seen as an additional metadata of the real-world data that disentangles important distribution characteristics, such as shape of the CDF, classification probabilities of underlying theoretical distributions and their parameters, information entropy, and skewness. Entropy changes, providing an "arrow of time", determine dynamic trajectories along representations of distributions on the latent space. In addition, post beta-VAE unsupervised segmentation of the latent space based on weight-of-evidence (WOE) of posterior versus standard isotopic two-dimensional normal densities has been applied detecting the presence of assignable causes that distinguish exceptional CDF inputs.
Comments: 12 pages, 11 figures, 2 tables
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2004.02687 [cs.LG]
  (or arXiv:2004.02687v1 [cs.LG] for this version)

Submission history

From: Alex Glushkovsky [view email]
[v1] Mon, 6 Apr 2020 14:11:13 GMT (1703kb)

Link back to: arXiv, form interface, contact.