We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Methodology

Title: Spectral clustering under degree heterogeneity: a case for the random walk Laplacian

Abstract: This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree. Under a generalised random dot product graph, the embedding provides uniformly consistent estimates of degree-corrected latent positions, with asymptotically Gaussian error. In the special case of a degree-corrected stochastic block model, the embedding concentrates about K distinct points, representing communities. These can be recovered perfectly, asymptotically, through a subsequent clustering step, without spherical projection, as commonly required by algorithms based on the adjacency or normalised, symmetric Laplacian matrices. While the estimand does not depend on degree, the asymptotic variance of its estimate does -- higher degree nodes are embedded more accurately than lower degree nodes. Our central limit theorem therefore suggests fitting a weighted Gaussian mixture model as the subsequent clustering step, for which we provide an expectation-maximisation algorithm.
Comments: 22 pages, 10 figures
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
MSC classes: 62H30 (primary), 62H12 (secondary)
Cite as: arXiv:2105.00987 [stat.ME]
  (or arXiv:2105.00987v2 [stat.ME] for this version)

Submission history

From: Patrick Rubin-Delanchy Dr [view email]
[v1] Mon, 3 May 2021 16:36:27 GMT (2121kb,D)
[v2] Tue, 4 May 2021 07:20:12 GMT (2121kb,D)

Link back to: arXiv, form interface, contact.