X-volution: On the unification of convolution and self-attention

Chen, Xuanhong; Wang, Hang; Ni, Bingbing

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2106

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: X-volution: On the unification of convolution and self-attention

Authors: Xuanhong Chen, Hang Wang, Bingbing Ni

(Submitted on 4 Jun 2021 (v1), last revised 7 Jun 2021 (this version, v2))

Abstract: Convolution and self-attention are acting as two fundamental building blocks in deep neural networks, where the former extracts local image features in a linear way while the latter non-locally encodes high-order contextual relationships. Though essentially complementary to each other, i.e., first-/high-order, stat-of-the-art architectures, i.e., CNNs or transformers lack a principled way to simultaneously apply both operations in a single computational module, due to their heterogeneous computing pattern and excessive burden of global dot-product for visual tasks. In this work, we theoretically derive a global self-attention approximation scheme, which approximates a self-attention via the convolution operation on transformed features. Based on the approximated scheme, we establish a multi-branch elementary module composed of both convolution and self-attention operation, capable of unifying both local and non-local feature interaction. Importantly, once trained, this multi-branch module could be conditionally converted into a single standard convolution operation via structural re-parameterization, rendering a pure convolution styled operator named X-volution, ready to be plugged into any modern networks as an atomic operation. Extensive experiments demonstrate that the proposed X-volution, achieves highly competitive visual understanding improvements (+1.2% top-1 accuracy on ImageNet classification, +1.7 box AP and +1.5 mask AP on COCO detection and segmentation).

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.02253 [cs.CV]
	(or arXiv:2106.02253v2 [cs.CV] for this version)

Submission history

From: Hang Wang [view email]
[v1] Fri, 4 Jun 2021 04:32:02 GMT (594kb,D)
[v2] Mon, 7 Jun 2021 09:03:46 GMT (594kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.02253

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: X-volution: On the unification of convolution and self-attention

Submission history