We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: GC3: An Optimizing Compiler for GPU Collective Communication

Abstract: Machine learning models made up of millions or billions of parameters are trained and served on large multi-GPU systems. As models grow in size and execute on more GPUs, the collective communications used in these applications become a bottleneck. Custom collective algorithms optimized for both particular network topologies and application specific communication patterns can alleviate this bottleneck and help these applications scale. However, correctly and efficiently implementing custom algorithms is challenging.
This paper introduces GC3, a system for programmable GPU communication. GC3 provides a domain specific language for writing collective communication algorithms and an optimizing compiler for lowering them to an executable form, which can be executed efficiently and flexibly in an interpreter based runtime. We used GC3 to write novel collective algorithms for AllReduce and AllToAll that are up to $1.9\times$ and $1.3\times$ faster than hand-optimized implementations, respectively.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:2201.11840 [cs.DC]
  (or arXiv:2201.11840v3 [cs.DC] for this version)

Submission history

From: Meghan Cowan [view email]
[v1] Thu, 27 Jan 2022 22:54:59 GMT (565kb,D)
[v2] Thu, 3 Feb 2022 01:07:17 GMT (567kb,D)
[v3] Tue, 19 Jul 2022 21:02:48 GMT (358kb,D)

Link back to: arXiv, form interface, contact.