SOON: Scenario Oriented Object Navigation with Graph-based Exploration
About
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots. Most visual navigation benchmarks, however, focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere. Accordingly, in this paper, we introduce a Scenario Oriented Object Navigation (SOON) task. In this task, an agent is required to navigate from an arbitrary position in a 3D embodied environment to localize a target following a scene description. To give a promising direction to solve this task, we propose a novel graph-based exploration (GBE) method, which models the navigation state as a graph and introduces a novel graph-based exploration approach to learn knowledge from the graph and stabilize training by learning sub-optimal trajectories. We also propose a new large-scale benchmark named From Anywhere to Object (FAO) dataset. To avoid target ambiguity, the descriptions in FAO provide rich semantic scene information includes: object attribute, object relationship, region description, and nearby region description. Our experiments reveal that the proposed GBE outperforms various state-of-the-arts on both FAO and R2R datasets. And the ablation studies on FAO validates the quality of the dataset.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Vision-Language Navigation | R2R Unseen (test) | SR53 | 116 | |
| Vision-and-Language Navigation | SOON (val unseen) | SPL13.34 | 16 | |
| Vision-and-Language Navigation | SOON unseen house (test) | Success Rate12.9 | 10 | |
| Vision-and-Language Navigation | SOON seen house (val) | SR7.63e+3 | 9 | |
| Vision-Language Navigation | R2R Unseen House (val) | Navigation Error (NE)5.2 | 9 | |
| Vision-and-Language Navigation | SOON seen instruction (val) | SR98.4 | 8 | |
| Scenario Oriented Object Navigation | FAO Unseen House (test) | OSR19.5 | 7 | |
| Scenario Oriented Object Navigation | FAO Seen Instruction (val) | OSR9.86e+3 | 6 | |
| Scenario Oriented Object Navigation | FAO Seen House (val) | OSR7.30e+3 | 6 | |
| Vision-and-Language Navigation | SOON (test) | SPL9.23 | 4 |