GC3: An Optimizing Compiler for GPU Collective Communication

Cowan, Meghan; Maleki, Saeed; Musuvathi, Madanlal; Saarikivi, Olli; Xiong, Yifan

Full-text links:

Download:

Current browse context:

cs.DC

< prev | next >

new | recent | 2201

Change to browse by:

Computer Science > Distributed, Parallel, and Cluster Computing

Title: GC3: An Optimizing Compiler for GPU Collective Communication

Authors: Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong

(Submitted on 27 Jan 2022 (v1), revised 3 Feb 2022 (this version, v2), latest version 19 Jul 2022 (v3))

Abstract: Machine learning models made up of millions or billions of parameters are often trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications becomes a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can alleviate this bottleneck and thus help these applications scale.
This paper introduces GC3, a system designed to make GPU communication programmable. GC3 provides a data oriented domain specific language for writing custom collective communication algorithms and an optimizing compiler for lowering them to an executable form, which can be executed efficiently and flexibly in an interpreter based runtime. We used GC3 to write novel collective implementations for AllReduce and AllToAll that are up to 48% and 20% faster than optimized vendor implementations, respectively. We also demonstrate how directly implementing an application specific collective called AllToNext in GC3 results in a 14.5 speedup over the baseline.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2201.11840 [cs.DC]
	(or arXiv:2201.11840v2 [cs.DC] for this version)

Submission history

From: Meghan Cowan [view email]
[v1] Thu, 27 Jan 2022 22:54:59 GMT (565kb,D)
[v2] Thu, 3 Feb 2022 01:07:17 GMT (567kb,D)
[v3] Tue, 19 Jul 2022 21:02:48 GMT (358kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.11840v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Distributed, Parallel, and Cluster Computing

Title: GC3: An Optimizing Compiler for GPU Collective Communication

Submission history