On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Li, Wenbo; Lu, Xin; Qian, Shengju; Lu, Jiangbo; Zhang, Xiangyu; Jia, Jiaya

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2112

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Authors: Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia

(Submitted on 19 Dec 2021 (v1), last revised 21 Mar 2022 (this version, v2))

Abstract: Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based pre-training regimes that boost various low-level tasks. To comprehensively diagnose the influence of pre-training, we design a whole set of principled evaluation tools that uncover its effects on internal representations. The observations demonstrate that pre-training plays strikingly different roles in low-level tasks. For example, pre-training introduces more local information to higher layers in super-resolution (SR), yielding significant performance gains, while pre-training hardly affects internal feature representations in denoising, resulting in limited gains. Further, we explore different methods of pre-training, revealing that multi-related-task pre-training is more effective and data-efficient than other alternatives. Finally, we extend our study to varying data scales and model sizes, as well as comparisons between transformers and CNNs-based architectures. Based on the study, we successfully develop state-of-the-art models for multiple low-level tasks. Code is released at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2112.10175 [cs.CV]
	(or arXiv:2112.10175v2 [cs.CV] for this version)

Submission history

From: Wenbo Li [view email]
[v1] Sun, 19 Dec 2021 15:50:48 GMT (7997kb,D)
[v2] Mon, 21 Mar 2022 17:32:08 GMT (12142kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2112.10175

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Submission history