Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Huang, Zilong; Ben, Youcheng; Luo, Guozhong; Cheng, Pei; Yu, Gang; Fu, Bin

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2106

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Authors: Zilong Huang, Youcheng Ben, Guozhong Luo, Pei Cheng, Gang Yu, Bin Fu

(Submitted on 7 Jun 2021)

Abstract: Very recently, Window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. In this work, we revisit the spatial shuffle as an efficient way to build connections among windows. As a result, we propose a new vision transformer, named Shuffle Transformer, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation. Code will be released for reproduction.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2106.03650 [cs.CV]
	(or arXiv:2106.03650v1 [cs.CV] for this version)

Submission history

From: Zilong Huang [view email]
[v1] Mon, 7 Jun 2021 14:22:07 GMT (255kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.03650

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Submission history