Can CNNs Be More Robust Than Transformers?

Wang, Zeyu; Bai, Yutong; Zhou, Yuyin; Xie, Cihang

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Can CNNs Be More Robust Than Transformers?

Authors: Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie

(Submitted on 7 Jun 2022 (v1), last revised 6 Mar 2023 (this version, v2))

Abstract: The recent success of Vision Transformers is shaking the long dominance of Convolutional Neural Networks (CNNs) in image recognition for a decade. Specifically, in terms of robustness on out-of-distribution samples, recent research finds that Transformers are inherently more robust than CNNs, regardless of different training setups. Moreover, it is believed that such superiority of Transformers should largely be credited to their self-attention-like architectures per se. In this paper, we question that belief by closely examining the design of Transformers. Our findings lead to three highly effective architecture designs for boosting robustness, yet simple enough to be implemented in several lines of code, namely a) patchifying input images, b) enlarging kernel size, and c) reducing activation layers and normalization layers. Bringing these components together, we are able to build pure CNN architectures without any attention-like operations that are as robust as, or even more robust than, Transformers. We hope this work can help the community better understand the design of robust neural architectures. The code is publicly available at this https URL

Comments:	ICLR2023. Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2206.03452 [cs.CV]
	(or arXiv:2206.03452v2 [cs.CV] for this version)

Submission history

From: Zeyu Wang [view email]
[v1] Tue, 7 Jun 2022 17:17:07 GMT (312kb,D)
[v2] Mon, 6 Mar 2023 05:51:33 GMT (320kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.03452

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Can CNNs Be More Robust Than Transformers?

Submission history