HM3D-OVON: A Dataset and Benchmark for Open-Vocabulary Object Goal Navigation
About
We present the Habitat-Matterport 3D Open Vocabulary Object Goal Navigation dataset (HM3D-OVON), a large-scale benchmark that broadens the scope and semantic range of prior Object Goal Navigation (ObjectNav) benchmarks. Leveraging the HM3DSem dataset, HM3D-OVON incorporates over 15k annotated instances of household objects across 379 distinct categories, derived from photo-realistic 3D scans of real-world environments. In contrast to earlier ObjectNav datasets, which limit goal objects to a predefined set of 6-20 categories, HM3D-OVON facilitates the training and evaluation of models with an open-set of goals defined through free-form language at test-time. Through this open-vocabulary formulation, HM3D-OVON encourages progress towards learning visuo-semantic navigation behaviors that are capable of searching for any object specified by text in an open-vocabulary manner. Additionally, we systematically evaluate and compare several different types of approaches on HM3D-OVON. We find that HM3D-OVON can be used to train an open-vocabulary ObjectNav agent that achieves both higher performance and is more robust to localization and actuation noise than the state-of-the-art ObjectNav approach. We hope that our benchmark and baseline results will drive interest in developing embodied agents that can navigate real-world spaces to find household objects specified through free-form language, taking a step towards more flexible and human-like semantic visual navigation. Code and videos available at: naoki.io/ovon.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Goal Navigation | HM3D-OVON Seen (val) | SR38.5 | 44 | |
| Object Goal Navigation | HM3D-OVON unseen (val) | Success Rate37.1 | 43 | |
| Object Goal Navigation | HM3D-OVON Seen-Synonyms (val) | SR39 | 35 | |
| Object Navigation | HM3D v1 (val) | SR57.6 | 32 | |
| Open-set ObjectGoal Navigation | HM3D-OVON unseen (val) | SR37.1 | 28 | |
| Open-Vocabulary Object Goal Navigation | HM3D-OVON (val-seen) | SR41.3 | 21 | |
| Open-Vocabulary Object Goal Navigation | HM3D-OVON seen-syn (val) | SR39 | 21 | |
| Object Navigation | HM3D v2 (val) | SR31.6 | 19 | |
| Object Goal Navigation | HM3D OVON | SR37.1 | 11 | |
| Open-Vocabulary Object Goal Navigation | HM3D OVON (test) | SR37.1 | 7 |