Alternating Updates for Efficient Transformers

Baykal, Cenk; Cutler, Dylan; Dikkala, Nishanth; Ghosh, Nikhil; Panigrahy, Rina; Wang, Xin

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2301

Computer Science > Machine Learning

Title: Alternating Updates for Efficient Transformers

Authors: Cenk Baykal, Dylan Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang

(Submitted on 30 Jan 2023 (this version), latest version 3 Oct 2023 (v2))

Abstract: It is well established that increasing scale in deep transformer networks leads to improved quality and performance. This increase in scale often comes with an increase in compute cost and inference latency. Consequently, research into methods which help realize the benefits of increased scale without leading to an increase in the compute cost becomes important. We introduce Alternating Updates (AltUp), a simple-to-implement method to increase a model's capacity without the computational burden. AltUp enables the widening of the learned representation without increasing the computation time by working on a subblock of the representation at each layer. Our experiments on various transformer models and language tasks demonstrate the consistent effectiveness of alternating updates on a diverse set of benchmarks. Finally, we present extensions of AltUp to the sequence dimension, and demonstrate how AltUp can be synergistically combined with existing approaches, such as Sparse Mixture-of-Experts models, to obtain efficient models with even higher capacity.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2301.13310 [cs.LG]
	(or arXiv:2301.13310v1 [cs.LG] for this version)

Submission history

From: Xin Wang [view email]
[v1] Mon, 30 Jan 2023 22:06:05 GMT (2284kb,D)
[v2] Tue, 3 Oct 2023 21:40:41 GMT (3091kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.13310v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Alternating Updates for Efficient Transformers

Submission history