We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: A New Family of Near-metrics for Universal Similarity

Abstract: We propose a family of near-metrics based on local graph diffusion to capture similarity for a wide class of data sets. These quasi-metametrics, as their names suggest, dispense with one or two standard axioms of metric spaces, specifically distinguishability and symmetry, so that similarity between data points of arbitrary type and form could be measured broadly and effectively. The proposed near-metric family includes the forward k-step diffusion and its reverse, typically on the graph consisting of data objects and their features. By construction, this family of near-metrics is particularly appropriate for categorical data, continuous data, and vector representations of images and text extracted via deep learning approaches. We conduct extensive experiments to evaluate the performance of this family of similarity measures and compare and contrast with traditional measures of similarity used for each specific application and with the ground truth when available. We show that for structured data including categorical and continuous data, the near-metrics corresponding to normalized forward k-step diffusion (k small) work as one of the best performing similarity measures; for vector representations of text and images including those extracted from deep learning, the near-metrics derived from normalized and reverse k-step graph diffusion (k very small) exhibit outstanding ability to distinguish data points from different classes.
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:1707.06903 [stat.ML]
  (or arXiv:1707.06903v3 [stat.ML] for this version)

Submission history

From: Chu Wang [view email]
[v1] Fri, 21 Jul 2017 14:02:46 GMT (4249kb,D)
[v2] Mon, 7 Aug 2017 02:36:52 GMT (4249kb,D)
[v3] Tue, 17 Oct 2017 08:34:25 GMT (8577kb,D)

Link back to: arXiv, form interface, contact.