We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Methodology

Title: Principal Ellipsoid Analysis (PEA): Efficient non-linear dimension reduction & clustering

Abstract: Even with the rise in popularity of over-parameterized models, simple dimensionality reduction and clustering methods, such as PCA and k-means, are still routinely used in an amazing variety of settings. A primary reason is the combination of simplicity, interpretability and computational efficiency. The focus of this article is on improving upon PCA and k-means, by allowing non-linear relations in the data and more flexible cluster shapes, without sacrificing the key advantages. The key contribution is a new framework for Principal Elliptical Analysis (PEA), defining a simple and computationally efficient alternative to PCA that fits the best elliptical approximation through the data. We provide theoretical guarantees on the proposed PEA algorithm using Vapnik-Chervonenkis (VC) theory to show strong consistency and uniform concentration bounds. Toy experiments illustrate the performance of PEA, and the ability to adapt to non-linear structure and complex cluster shapes. In a rich variety of real data clustering applications, PEA is shown to do as well as k-means for simple datasets, while dramatically improving performance in more complex settings.
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
Cite as: arXiv:2008.07110 [stat.ME]
  (or arXiv:2008.07110v2 [stat.ME] for this version)

Submission history

From: Didong Li [view email]
[v1] Mon, 17 Aug 2020 06:25:50 GMT (44kb,D)
[v2] Mon, 7 Sep 2020 03:23:24 GMT (41kb,D)

Link back to: arXiv, form interface, contact.