We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Databases

Title: SSumM: Sparse Summarization of Massive Graphs

Abstract: Given a graph G and the desired size k in bits, how can we summarize G within k bits, while minimizing the information loss?
Large-scale graphs have become omnipresent, posing considerable computational challenges. Analyzing such large graphs can be fast and easy if they are compressed sufficiently to fit in main memory or even cache. Graph summarization, which yields a coarse-grained summary graph with merged nodes, stands out with several advantages among graph compression techniques. Thus, a number of algorithms have been developed for obtaining a concise summary graph with little information loss or equivalently small reconstruction error. However, the existing methods focus solely on reducing the number of nodes, and they often yield dense summary graphs, failing to achieve better compression rates. Moreover, due to their limited scalability, they can be applied only to moderate-size graphs.
In this work, we propose SSumM, a scalable and effective graph-summarization algorithm that yields a sparse summary graph. SSumM not only merges nodes together but also sparsifies the summary graph, and the two strategies are carefully balanced based on the minimum description length principle. Compared with state-of-the-art competitors, SSumM is (a) Concise: yields up to 11.2X smaller summary graphs with similar reconstruction error, (b) Accurate: achieves up to 4.2X smaller reconstruction error with similarly concise outputs, and (c) Scalable: summarizes 26X larger graphs while exhibiting linear scalability. We validate these advantages through extensive experiments on 10 real-world graphs.
Comments: to be published in the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '20)
Subjects: Databases (cs.DB); Social and Information Networks (cs.SI)
ACM classes: H.2.8
DOI: 10.1145/3394486.3403057
Cite as: arXiv:2006.01060 [cs.DB]
  (or arXiv:2006.01060v3 [cs.DB] for this version)

Submission history

From: Kyuhan Lee [view email]
[v1] Mon, 1 Jun 2020 16:38:19 GMT (1868kb,D)
[v2] Tue, 2 Jun 2020 00:49:18 GMT (1869kb,D)
[v3] Wed, 15 Jul 2020 04:20:44 GMT (1861kb,D)

Link back to: arXiv, form interface, contact.