FOM-Nav: Frontier-Object Maps for Object Goal Navigation
About
This paper addresses the Object Goal Navigation problem, where a robot must efficiently find a target object in an unknown environment. Existing implicit memory-based methods struggle with long-term memory retention and planning, while explicit map-based approaches lack rich semantic information. To address these challenges, we propose FOM-Nav, a modular framework that enhances exploration efficiency through Frontier-Object Maps and vision-language models. Our Frontier-Object Maps are built online and jointly encode spatial frontiers and fine-grained object information. Using this representation, a vision-language model performs multimodal scene understanding and high-level goal prediction, which is executed by a low-level planner for efficient trajectory generation. To train FOM-Nav, we automatically construct large-scale navigation datasets from real-world scanned environments. Extensive experiments validate the effectiveness of our model design and constructed dataset. FOM-Nav achieves state-of-the-art performance on the MP3D and HM3D benchmarks, particularly in navigation efficiency metric SPL, and yields promising results on a real robot.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| ObjectGoal Navigation | MP3D (val) | Success Rate35 | 68 | |
| Object Goal Navigation | HM3D v1 (val) | Success Rate (SR)57.3 | 34 | |
| Object Navigation | HM3D v2 (val) | SR75.8 | 19 | |
| Object Navigation | OVON v1 (val) | SR (seen)42.5 | 6 | |
| Object Navigation | HM3D v1sub (val) | Success Rate (SR)0.73 | 5 | |
| Object Navigation | MP3D sub (val) | SR44.6 | 4 |