Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining

Kaler, Tim; Stathas, Nickolas; Ouyang, Anne; Iliopoulos, Alexandros-Stavros; Schardl, Tao B.; Leiserson, Charles E.; Chen, Jie

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2110

Computer Science > Machine Learning

Title: Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining

Authors: Tim Kaler, Nickolas Stathas, Anne Ouyang, Alexandros-Stavros Iliopoulos, Tao B. Schardl, Charles E. Leiserson, Jie Chen

(Submitted on 16 Oct 2021 (v1), last revised 16 Mar 2022 (this version, v2))

Abstract: Improving the training and inference performance of graph neural networks (GNNs) is faced with a challenge uncommon in general neural networks: creating mini-batches requires a lot of computation and data movement due to the exponential growth of multi-hop graph neighborhoods along network layers. Such a unique challenge gives rise to a diverse set of system design choices. We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment, under which we identify major performance bottlenecks hitherto under-explored by developers: mini-batch preparation and transfer. We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler, a shared-memory parallelization strategy, and the pipelining of batch transfer with GPU computation. We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised. Such an observation unifies training and inference, simplifying model implementation. We report comprehensive experimental results with several benchmark data sets and GNN architectures, including a demonstration that, for the ogbn-papers100M data set, our system SALIENT achieves a speedup of 3x over a standard PyTorch-Geometric implementation with a single GPU and a further 8x parallel speedup with 16 GPUs. Therein, training a 3-layer GraphSAGE model with sampling fanout (15, 10, 5) takes 2.0 seconds per epoch and inference with fanout (20, 20, 20) takes 2.4 seconds, attaining test accuracy 64.58%.

Comments:	MLSys 2022. Code is available at this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Performance (cs.PF)
Cite as:	arXiv:2110.08450 [cs.LG]
	(or arXiv:2110.08450v2 [cs.LG] for this version)

Submission history

From: Jie Chen [view email]
[v1] Sat, 16 Oct 2021 02:41:35 GMT (149kb,D)
[v2] Wed, 16 Mar 2022 21:09:27 GMT (168kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.08450

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and Pipelining

Submission history