References & Citations
Computer Science > Machine Learning
Title: Logarithmic proximity measures outperform plain ones in graph nodes clustering
(Submitted on 3 May 2016 (this version), latest version 18 Feb 2017 (v3))
Abstract: We consider a number of graph kernels and proximity measures: commute time kernel, regularized Laplacian kernel, heat kernel, communicability, etc., and the corresponding distances as applied to clustering nodes in random graphs. The model of generating graphs involves edge probabilities for the pairs of nodes that belong to the same class or different classes. It turns out that in most cases, logarithmic measures (i.e., measures resulting after taking logarithm of the proximities) perform much better while distinguishing classes than the "plain" measures. A direct comparison of inter-class and intra-class distances confirms this conclusion. A possible explanation of this fact is that most kernels have a multiplicative nature, while the nature of distances used in cluster algorithms is an additive one (cf. the triangle inequality). The logarithmic transformation is just a tool to transform one nature to another. Moreover, some distances corresponding to the logarithmic measures possess a meaningful cutpoint additivity property. In our experiments, the leader is the so-called logarithmic communicability measure, which distinctly outperforms the other measures under study.
Submission history
From: Pavel Chebotarev [view email][v1] Tue, 3 May 2016 19:52:48 GMT (2496kb,D)
[v2] Thu, 15 Dec 2016 20:01:08 GMT (5917kb,D)
[v3] Sat, 18 Feb 2017 09:04:02 GMT (5554kb,D)
Link back to: arXiv, form interface, contact.