We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.AR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Hardware Architecture

Title: Computation vs. Communication Scaling for Future Transformers on Future Hardware

Abstract: Scaling DNNs is shown to deliver dramatic quality gains across ML problems. This, however, has also led to a concomitant quadratic increase in computation cost. To tackle this, along with the failure of accelerator memory capacity to keep up, training these models increasingly relies on distributed training techniques. As such, an important question of interest is: how will compute and communication relatively scale as models scale and hardware evolves? A careful study which answers this question can better guide the design of future systems. To this end, this work provides a comprehensive multi-axial (algorithmic, empirical, hardware evolution) analysis of compute vs. communication (Comp-vs.-Comm) scaling for future Transformer models on future hardware. Using algorithmic analysis we show that compute generally enjoys an edge over communication as models scale. However, when viewed through the lens of slower memory capacity scaling, these trends are being stressed. Next, we craft an empirical strategy to study Comp-vs.-Comm scaling for future models/hardware using existing hardware. This allows hundreds of future models/hardware scenarios to be studied at three orders of magnitude lower profiling costs. Our experiments demonstrate that communication will be a significant portion (about 40-75%) of execution as models and hardware evolve, and communication which is today hidden by overlapped computation will likely get exposed. Further, the generality of our strategy makes it a strong basis to perform Comp-vs.-Comm scaling analysis for any future model. Overall, this work underscores the increasingly large role communication will play as models scale.
Subjects: Hardware Architecture (cs.AR); Distributed, Parallel, and Cluster Computing (cs.DC)
ACM classes: C.4; C.2.4
Cite as: arXiv:2302.02825 [cs.AR]
  (or arXiv:2302.02825v2 [cs.AR] for this version)

Submission history

From: Suchita Pati [view email]
[v1] Mon, 6 Feb 2023 14:43:29 GMT (1175kb)
[v2] Wed, 8 Feb 2023 22:59:08 GMT (1308kb)
[v3] Wed, 3 May 2023 01:26:17 GMT (1226kb)

Link back to: arXiv, form interface, contact.