We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Transformer Transforms Salient Object Detection and Camouflaged Object Detection

Abstract: The transformer networks, which originate from machine translation, are particularly good at modeling long-range dependencies within a long sequence. Currently, the transformer networks are making revolutionary progress in various vision tasks ranging from high-level classification tasks to low-level dense prediction tasks. In this paper, we conduct research on applying the transformer networks for salient object detection (SOD). Specifically, we adopt the dense transformer backbone for fully supervised RGB image based SOD, RGB-D image pair based SOD, and weakly supervised SOD via scribble supervision. As an extension, we also apply our fully supervised model to the task of camouflaged object detection (COD) for camouflaged object segmentation. For the fully supervised models, we define the dense transformer backbone as feature encoder, and design a very simple decoder to produce a one channel saliency map (or camouflage map for the COD task). For the weakly supervised model, as there exists no structure information in the scribble annotation, we first adopt the recent proposed Gated-CRF loss to effectively model the pair-wise relationships for accurate model prediction. Then, we introduce self-supervised learning strategy to push the model to produce scale-invariant predictions, which is proven effective for weakly supervised models and models trained on small training datasets. Extensive experimental results on various SOD and COD tasks (fully supervised RGB image based SOD, fully supervised RGB-D image pair based SOD, weakly supervised SOD via scribble supervision, and fully supervised RGB image based COD) illustrate that transformer networks can transform salient object detection and camouflaged object detection, leading to new benchmarks for each related task.
Comments: Technical report, 15 pages, 18 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2104.10127 [cs.CV]
  (or arXiv:2104.10127v1 [cs.CV] for this version)

Submission history

From: Jing Zhang [view email]
[v1] Tue, 20 Apr 2021 17:12:51 GMT (14704kb,D)
[v2] Sat, 26 Jun 2021 04:21:59 GMT (25450kb,D)
[v3] Wed, 26 Jan 2022 04:29:31 GMT (23750kb,D)
[v4] Mon, 31 Jan 2022 23:26:41 GMT (30685kb,D)
[v5] Fri, 30 Dec 2022 12:12:38 GMT (19966kb,D)

Link back to: arXiv, form interface, contact.