We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: TorchScale: Transformers at Scale

Abstract: Large Transformers have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling Transformers focus on improving training or inference with better parallelization. In this work, we present TorchScale, an open-source toolkit that allows researchers and developers to scale up Transformers efficiently and effectively. TorchScale has the implementation of several modeling techniques, which can improve modeling generality and capability, as well as training stability and efficiency. Experimental results on language modeling and neural machine translation demonstrate that TorchScale can successfully scale Transformers to different sizes without tears. The library is available at this https URL
Comments: Work in progress
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as: arXiv:2211.13184 [cs.LG]
  (or arXiv:2211.13184v1 [cs.LG] for this version)

Submission history

From: Shuming Ma [view email]
[v1] Wed, 23 Nov 2022 17:58:51 GMT (56kb,D)

Link back to: arXiv, form interface, contact.