ReMamber: Referring Image Segmentation with Mamba Twister

Yang, Yuhuan; Ma, Chaofan; Yao, Jiangchao; Zhong, Zhun; Zhang, Ya; Wang, Yanfeng

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2403

Computer Science > Computer Vision and Pattern Recognition

Title: ReMamber: Referring Image Segmentation with Mamba Twister

Authors: Yuhuan Yang, Chaofan Ma, Jiangchao Yao, Zhun Zhong, Ya Zhang, Yanfeng Wang

(Submitted on 26 Mar 2024)

Abstract: Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks. However, the quadratic computation cost makes it resource-consuming in capturing long-range visual-language dependencies. Fortunately, Mamba addresses this with efficient linear complexity in processing. However, directly applying Mamba to multi-modal interactions presents challenges, primarily due to inadequate channel interactions for the effective fusion of multi-modal data. In this paper, we propose ReMamber, a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block. The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism. We achieve the state-of-the-art on three challenging benchmarks. Moreover, we conduct thorough analyses of ReMamber and discuss other fusion designs using Mamba. These provide valuable perspectives for future research.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2403.17839 [cs.CV]
	(or arXiv:2403.17839v1 [cs.CV] for this version)

Submission history

From: Yuhuan Yang [view email]
[v1] Tue, 26 Mar 2024 16:27:37 GMT (6558kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2403.17839

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: ReMamber: Referring Image Segmentation with Mamba Twister

Submission history