We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.AP

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Applications

Title: Simultaneous Dimension Reduction and Clustering via the NMF-EM Algorithm

Abstract: Mixture models are among the most popular tools for clustering. However, when the dimension and the number of clusters is large, the estimation of the clusters become challenging, as well as their interpretation. Restriction on the parameters can be used to reduce the dimension. An example is given by mixture of factor analyzers for Gaussian mixtures. The extension of MFA to non-Gaussian mixtures is not straightforward. We propose a new constraint for parameters in non-Gaussian mixture model: the $K$ components parameters are combinations of elements from a small dictionary, say $H$ elements, with $H \ll K$. Including a nonnegative matrix factorization (NMF) in the EM algorithm allows us to simultaneously estimate the dictionary and the parameters of the mixture. We propose the acronym NMF-EM for this algorithm, implemented in the R package {\tt nmfem}. This original approach is motivated by passengers clustering from ticketing data: we apply NMF-EM to data from two Transdev public transport networks. In this case, the words are easily interpreted as typical slots in a timetable.
Subjects: Applications (stat.AP)
Journal reference: Advances in Data Analysis and Classification, 2021, vol. 15, no. 1, pp. 231-260
DOI: 10.1007/s11634-020-00398-4
Cite as: arXiv:1709.03346 [stat.AP]
  (or arXiv:1709.03346v2 [stat.AP] for this version)

Submission history

From: Pierre Alquier [view email]
[v1] Mon, 11 Sep 2017 11:58:33 GMT (9321kb,D)
[v2] Tue, 5 Jun 2018 21:29:34 GMT (9324kb,D)

Link back to: arXiv, form interface, contact.