Understanding Multi-Head Attention in Abstractive Summarization

Baan, Joris; ter Hoeve, Maartje; van der Wees, Marlies; Schuth, Anne; de Rijke, Maarten

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 1911

Computer Science > Computation and Language

Title: Understanding Multi-Head Attention in Abstractive Summarization

Authors: Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, Maarten de Rijke

(Submitted on 10 Nov 2019)

Abstract: Attention mechanisms in deep learning architectures have often been used as a means of transparency and, as such, to shed light on the inner workings of the architectures. Recently, there has been a growing interest in whether or not this assumption is correct. In this paper we investigate the interpretability of multi-head attention in abstractive summarization, a sequence-to-sequence task for which attention does not have an intuitive alignment role, such as in machine translation. We first introduce three metrics to gain insight in the focus of attention heads and observe that these heads specialize towards relative positions, specific part-of-speech tags, and named entities. However, we also find that ablating and pruning these heads does not lead to a significant drop in performance, indicating redundancy. By replacing the softmax activation functions with sparsemax activation functions, we find that attention heads behave seemingly more transparent: we can ablate fewer heads and heads score higher on our interpretability metrics. However, if we apply pruning to the sparsemax model we find that we can prune even more heads, raising the question whether enforced sparsity actually improves transparency. Finally, we find that relative positions heads seem integral to summarization performance and persistently remain after pruning.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1911.03898 [cs.CL]
	(or arXiv:1911.03898v1 [cs.CL] for this version)

Submission history

From: Joris Baan [view email]
[v1] Sun, 10 Nov 2019 10:56:10 GMT (636kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1911.03898

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Understanding Multi-Head Attention in Abstractive Summarization

Submission history