GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

Goyal, Nikhil; Jain, Harsh Vardhan; Ranu, Sayan

doi:10.1145/3366423.3380201

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2001

Computer Science > Machine Learning

Title: GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

Authors: Nikhil Goyal, Harsh Vardhan Jain, Sayan Ranu

(Submitted on 22 Jan 2020 (v1), last revised 8 Apr 2020 (this version, v2))

Abstract: Graph generative models have been extensively studied in the data mining literature. While traditional techniques are based on generating structures that adhere to a pre-decided distribution, recent techniques have shifted towards learning this distribution directly from the data. While learning-based approaches have imparted significant improvement in quality, some limitations remain to be addressed. First, learning graph distributions introduces additional computational overhead, which limits their scalability to large graph databases. Second, many techniques only learn the structure and do not address the need to also learn node and edge labels, which encode important semantic information and influence the structure itself. Third, existing techniques often incorporate domain-specific rules and lack generalizability. Fourth, the experimentation of existing techniques is not comprehensive enough due to either using weak evaluation metrics or focusing primarily on synthetic or small datasets. In this work, we develop a domain-agnostic technique called GraphGen to overcome all of these limitations. GraphGen converts graphs to sequences using minimum DFS codes. Minimum DFS codes are canonical labels and capture the graph structure precisely along with the label information. The complex joint distributions between structure and semantic labels are learned through a novel LSTM architecture. Extensive experiments on million-sized, real graph datasets show GraphGen to be 4 times faster on average than state-of-the-art techniques while being significantly better in quality across a comprehensive set of 11 different metrics. Our code is released at this https URL

Comments:	Fixed typo in Table 1; The Web Conference (WWW) 2020
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
DOI:	10.1145/3366423.3380201
Cite as:	arXiv:2001.08184 [cs.LG]
	(or arXiv:2001.08184v2 [cs.LG] for this version)

Submission history

From: Harsh Vardhan Jain [view email]
[v1] Wed, 22 Jan 2020 18:07:43 GMT (1115kb,D)
[v2] Wed, 8 Apr 2020 13:18:05 GMT (1117kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2001.08184v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: GraphGen: A Scalable Approach to Domain-agnostic Labeled Graph Generation

Submission history