Multiscale Self Attentive Convolutions for Vision and Language Modeling

Barkan, Oren

Full-text links:

Download:

PDF only

Current browse context:

cs.LG

< prev | next >

new | recent | 1912

Computer Science > Machine Learning

Title: Multiscale Self Attentive Convolutions for Vision and Language Modeling

Authors: Oren Barkan

(Submitted on 3 Dec 2019)

Abstract: Self attention mechanisms have become a key building block in many state-of-the-art language understanding models. In this paper, we show that the self attention operator can be formulated in terms of 1x1 convolution operations. Following this observation, we propose several novel operators: First, we introduce a 2D version of self attention that is applicable for 2D signals such as images. Second, we present the 1D and 2D Self Attentive Convolutions (SAC) operator that generalizes self attention beyond 1x1 convolutions to 1xm and nxm convolutions, respectively. While 1D and 2D self attention operate on individual words and pixels, SAC operates on m-grams and image patches, respectively. Third, we present a multiscale version of SAC (MSAC) which analyzes the input by employing multiple SAC operators that vary by filter size, in parallel. Finally, we explain how MSAC can be utilized for vision and language modeling, and further harness MSAC to form a cross attentive image similarity machinery.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:1912.01521 [cs.LG]
	(or arXiv:1912.01521v1 [cs.LG] for this version)

Submission history

From: Oren Barkan [view email]
[v1] Tue, 3 Dec 2019 16:51:09 GMT (99kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1912.01521

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Multiscale Self Attentive Convolutions for Vision and Language Modeling

Submission history