GMAT: Global Memory Augmentation for Transformers

Gupta, Ankit; Berant, Jonathan

Full-text links:

Download:

Current browse context:

stat

< prev | next >

new | recent | 2006

Computer Science > Machine Learning

Title: GMAT: Global Memory Augmentation for Transformers

Authors: Ankit Gupta, Jonathan Berant

(Submitted on 5 Jun 2020)

Abstract: Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise dot-product}$ attention that has a large $\Omega(L^2)$ memory requirement for length $L$ sequences, limiting its ability to process long documents. This has been the subject of substantial interest recently, where multiple approximations were proposed to reduce the quadratic memory requirement using sparse attention matrices. In this work, we propose to augment sparse Transformer blocks with a dense attention-based $\textit{global memory}$ of length $M$ ($\ll L$) which provides an aggregate global view of the entire input sequence to each position. Our augmentation has a manageable $O(M\cdot(L+M))$ memory overhead, and can be seamlessly integrated with prior sparse solutions. Moreover, global memory can also be used for sequence compression, by representing a long input sequence with the memory representations only. We empirically show that our method leads to substantial improvement on a range of tasks, including (a) synthetic tasks that require global reasoning, (b) masked language modeling, and (c) reading comprehension.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Cite as:	arXiv:2006.03274 [cs.LG]
	(or arXiv:2006.03274v1 [cs.LG] for this version)

Submission history

From: Ankit Gupta [view email]
[v1] Fri, 5 Jun 2020 07:50:40 GMT (463kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.03274

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Machine Learning

Title: GMAT: Global Memory Augmentation for Transformers

Submission history