Transformer-based Network for RGB-D Saliency Detection

Wang, Yue; Jia, Xu; Zhang, Lu; Li, Yuke; Elder, James; Lu, Huchuan

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2112

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Transformer-based Network for RGB-D Saliency Detection

Authors: Yue Wang, Xu Jia, Lu Zhang, Yuke Li, James Elder, Huchuan Lu

(Submitted on 1 Dec 2021)

Abstract: RGB-D saliency detection integrates information from both RGB images and depth maps to improve prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based network to address this issue. Our proposed architecture is composed of two modules: a transformer-based within-modality feature enhancement module (TWFEM) and a transformer-based feature fusion module (TFFM). TFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. TWFEM enhances feature on each scale by selecting and integrating complementary information from other scales within the same modality before TFFM. We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement, and simplifies the model design. Extensive experimental results on six benchmark datasets demonstrate that our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2112.00582 [cs.CV]
	(or arXiv:2112.00582v1 [cs.CV] for this version)

Submission history

From: Yue Wang [view email]
[v1] Wed, 1 Dec 2021 15:53:58 GMT (14714kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2112.00582

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Transformer-based Network for RGB-D Saliency Detection

Submission history