We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computer Vision and Pattern Recognition

Title: Local Slot Attention for Vision-and-Language Navigation

Abstract: Vision-and-language navigation (VLN), a frontier study aiming to pave the way for general-purpose robots, has been a hot topic in the computer vision and natural language processing community. The VLN task requires an agent to navigate to a goal location following natural language instructions in unfamiliar environments.
Recently, transformer-based models have gained significant improvements on the VLN task. Since the attention mechanism in the transformer architecture can better integrate inter- and intra-modal information of vision and language.
However, there exist two problems in current transformer-based models.
1) The models process each view independently without taking the integrity of the objects into account.
2) During the self-attention operation in the visual modality, the views that are spatially distant can be inter-weaved with each other without explicit restriction. This kind of mixing may introduce extra noise instead of useful information.
To address these issues, we propose 1) A slot-attention based module to incorporate information from segmentation of the same object. 2) A local attention mask mechanism to limit the visual attention span. The proposed modules can be easily plugged into any VLN architecture and we use the Recurrent VLN-Bert as our base model. Experiments on the R2R dataset show that our model has achieved the state-of-the-art results.
Comments: ICMR 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV)
DOI: 10.1145/3512527.3531366
Cite as: arXiv:2206.08645 [cs.CV]
  (or arXiv:2206.08645v2 [cs.CV] for this version)

Submission history

From: Yifeng Zhuang [view email]
[v1] Fri, 17 Jun 2022 09:21:26 GMT (3992kb,D)
[v2] Wed, 22 Jun 2022 02:32:32 GMT (3992kb,D)

Link back to: arXiv, form interface, contact.