We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: BiLingUNet: Image Segmentation by Modulating Top-Down and Bottom-Up Visual Processing with Referring Expressions

Abstract: We present BiLingUNet, a state-of-the-art model for image segmentation using referring expressions. BiLingUNet uses language to customize visual filters and outperforms approaches that concatenate a linguistic representation to the visual input. We find that using language to modulate both bottom-up and top-down visual processing works better than just making the top-down processing language-conditional. We argue that common 1x1 language-conditional filters cannot represent relational concepts and experimentally demonstrate that wider filters work better. Our model achieves state-of-the-art performance on four referring expression datasets.
Comments: 18 pages, 3 figures, submitted to ECCV 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2003.12739 [cs.CV]
  (or arXiv:2003.12739v1 [cs.CV] for this version)

Submission history

From: İlker Kesen [view email]
[v1] Sat, 28 Mar 2020 07:54:03 GMT (749kb,D)
[v2] Mon, 18 Oct 2021 11:30:12 GMT (1654kb,D)
[v3] Thu, 23 Jun 2022 14:02:40 GMT (1524kb,D)

Link back to: arXiv, form interface, contact.