Current browse context:
cs.CV
Change to browse by:
References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions
(Submitted on 28 Mar 2020 (this version), latest version 23 Jun 2022 (v3))
Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. We find that using language to modulate both bottom-up and top-down visual processing works better than just making the top-down processing language-conditional. We argue that common 1x1 language-conditional filters cannot represent relational concepts and experimentally demonstrate that wider filters work better. Our model achieves state-of-the-art performance on four referring expression datasets.
Submission history
From: İlker Kesen [view email][v1] Sat, 28 Mar 2020 07:54:03 GMT (749kb,D)
[v2] Mon, 18 Oct 2021 11:30:12 GMT (1654kb,D)
[v3] Thu, 23 Jun 2022 14:02:40 GMT (1524kb,D)
Link back to: arXiv, form interface, contact.