Iterative Vision-and-Language Navigation

Krantz, Jacob; Banerjee, Shurjo; Zhu, Wang; Corso, Jason; Anderson, Peter; Lee, Stefan; Thomason, Jesse

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2210

Computer Science > Computer Vision and Pattern Recognition

Title: Iterative Vision-and-Language Navigation

Authors: Jacob Krantz, Shurjo Banerjee, Wang Zhu, Jason Corso, Peter Anderson, Stefan Lee, Jesse Thomason

(Submitted on 6 Oct 2022 (v1), last revised 24 Dec 2023 (this version, v3))

Abstract: We present Iterative Vision-and-Language Navigation (IVLN), a paradigm for evaluating language-guided agents navigating in a persistent environment over time. Existing Vision-and-Language Navigation (VLN) benchmarks erase the agent's memory at the beginning of every episode, testing the ability to perform cold-start navigation with no prior information. However, deployed robots occupy the same environment for long periods of time. The IVLN paradigm addresses this disparity by training and evaluating VLN agents that maintain memory across tours of scenes that consist of up to 100 ordered instruction-following Room-to-Room (R2R) episodes, each defined by an individual language instruction and a target path. We present discrete and continuous Iterative Room-to-Room (IR2R) benchmarks comprising about 400 tours each in 80 indoor scenes. We find that extending the implicit memory of high-performing transformer VLN agents is not sufficient for IVLN, but agents that build maps can benefit from environment persistence, motivating a renewed focus on map-building agents in VLN.

Comments:	Accepted by CVPR 2023
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Robotics (cs.RO)
Cite as:	arXiv:2210.03087 [cs.CV]
	(or arXiv:2210.03087v3 [cs.CV] for this version)

Submission history

From: Wang Zhu [view email]
[v1] Thu, 6 Oct 2022 17:46:00 GMT (6460kb,D)
[v2] Wed, 20 Dec 2023 17:24:33 GMT (10159kb,D)
[v3] Sun, 24 Dec 2023 05:37:26 GMT (10159kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2210.03087

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Iterative Vision-and-Language Navigation

Submission history