We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Is Mapping Necessary for Realistic PointGoal Navigation?

Abstract: Can an autonomous agent navigate in a new environment without building an explicit map?
For the task of PointGoal navigation ('Go to $\Delta x$, $\Delta y$') under idealized settings (no RGB-D and actuation noise, perfect GPS+Compass), the answer is a clear 'yes' - map-less neural models composed of task-agnostic components (CNNs and RNNs) trained with large-scale reinforcement learning achieve 100% Success on a standard dataset (Gibson). However, for PointNav in a realistic setting (RGB-D and actuation noise, no GPS+Compass), this is an open question; one we tackle in this paper. The strongest published result for this task is 71.7% Success.
First, we identify the main (perhaps, only) cause of the drop in performance: the absence of GPS+Compass. An agent with perfect GPS+Compass faced with RGB-D sensing and actuation noise achieves 99.8% Success (Gibson-v2 val). This suggests that (to paraphrase a meme) robust visual odometry is all we need for realistic PointNav; if we can achieve that, we can ignore the sensing and actuation noise.
With that as our operating hypothesis, we scale the dataset and model size, and develop human-annotation-free data-augmentation techniques to train models for visual odometry. We advance the state of art on the Habitat Realistic PointNav Challenge from 71% to 94% Success (+23, 31% relative) and 53% to 74% SPL (+21, 40% relative). While our approach does not saturate or 'solve' this dataset, this strong improvement combined with promising zero-shot sim2real transfer (to a LoCoBot) provides evidence consistent with the hypothesis that explicit mapping may not be necessary for navigation, even in a realistic setting.
Comments: Corrected typos in the Abstract
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2206.00997 [cs.CV]
  (or arXiv:2206.00997v2 [cs.CV] for this version)

Submission history

From: Ruslan Partsey [view email]
[v1] Thu, 2 Jun 2022 11:37:27 GMT (47713kb,D)
[v2] Tue, 7 Jun 2022 08:19:33 GMT (47713kb,D)

Link back to: arXiv, form interface, contact.