Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Mao, Yuxin; Zhang, Jing; Wan, Zhexiong; Dai, Yuchao; Li, Aixuan; Lv, Yunqiu; Tian, Xinyu; Fan, Deng-Ping; Barnes, Nick

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2104

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Authors: Yuxin Mao, Jing Zhang, Zhexiong Wan, Yuchao Dai, Aixuan Li, Yunqiu Lv, Xinyu Tian, Deng-Ping Fan, Nick Barnes

(Submitted on 20 Apr 2021 (this version), latest version 30 Dec 2022 (v5))

Abstract: The transformer networks, which originate from machine translation, are particularly good at modeling long-range dependencies within a long sequence. Currently, the transformer networks are making revolutionary progress in various vision tasks ranging from high-level classification tasks to low-level dense prediction tasks. In this paper, we conduct research on applying the transformer networks for salient object detection (SOD). Specifically, we adopt the dense transformer backbone for fully supervised RGB image based SOD, RGB-D image pair based SOD, and weakly supervised SOD via scribble supervision. As an extension, we also apply our fully supervised model to the task of camouflaged object detection (COD) for camouflaged object segmentation. For the fully supervised models, we define the dense transformer backbone as feature encoder, and design a very simple decoder to produce a one channel saliency map (or camouflage map for the COD task). For the weakly supervised model, as there exists no structure information in the scribble annotation, we first adopt the recent proposed Gated-CRF loss to effectively model the pair-wise relationships for accurate model prediction. Then, we introduce self-supervised learning strategy to push the model to produce scale-invariant predictions, which is proven effective for weakly supervised models and models trained on small training datasets. Extensive experimental results on various SOD and COD tasks (fully supervised RGB image based SOD, fully supervised RGB-D image pair based SOD, weakly supervised SOD via scribble supervision, and fully supervised RGB image based COD) illustrate that transformer networks can transform salient object detection and camouflaged object detection, leading to new benchmarks for each related task.

Comments:	Technical report, 15 pages, 18 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2104.10127 [cs.CV]
	(or arXiv:2104.10127v1 [cs.CV] for this version)

Submission history

From: Jing Zhang [view email]
[v1] Tue, 20 Apr 2021 17:12:51 GMT (14704kb,D)
[v2] Sat, 26 Jun 2021 04:21:59 GMT (25450kb,D)
[v3] Wed, 26 Jan 2022 04:29:31 GMT (23750kb,D)
[v4] Mon, 31 Jan 2022 23:26:41 GMT (30685kb,D)
[v5] Fri, 30 Dec 2022 12:12:38 GMT (19966kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2104.10127v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Submission history