We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: The Curse Revisited: a Newly Quantified Concept of Meaningful Distances for Learning from High-Dimensional Noisy Data

Abstract: Distances between data points are widely used in point cloud representation learning. Yet, it is no secret that under the effect of noise, these distances-and thus the models based upon them-may lose their usefulness in high dimensions. Indeed, the small marginal effects of the noise may then accumulate quickly, shifting empirical closest and furthest neighbors away from the ground truth. In this paper, we characterize such effects in high-dimensional data using an asymptotic probabilistic expression. Furthermore, while it has been previously argued that neighborhood queries become meaningless and unstable when there is a poor relative discrimination between the furthest and closest point, we conclude that this is not necessarily the case when explicitly separating the ground truth data from the noise. More specifically, we derive that under particular conditions, empirical neighborhood relations affected by noise are still likely to be true even when we observe this discrimination to be poor. We include thorough empirical verification of our results, as well as experiments that interestingly show our derived phase shift where neighbors become random or not is identical to the phase shift where common dimensionality reduction methods perform poorly or well for finding low-dimensional representations of high-dimensional data with dense noise.
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2109.10569 [cs.LG]
  (or arXiv:2109.10569v1 [cs.LG] for this version)

Submission history

From: Robin Vandaele [view email]
[v1] Wed, 22 Sep 2021 08:04:15 GMT (395kb,D)

Link back to: arXiv, form interface, contact.