Synthetic vs. Real Training Data for Visual Navigation
About
This paper investigates how the performance of visual navigation policies trained in simulation compares to policies trained with real-world data. Performance degradation of simulator-trained policies is often significant when they are evaluated in the real world. However, despite this well-known sim-to-real gap, we demonstrate that simulator-trained policies can match the performance of their real-world-trained counterparts. Central to our approach is a navigation policy architecture that bridges the sim-to-real appearance gap by leveraging pretrained visual representations and runs real-time on robot hardware. Evaluations on a wheeled mobile robot show that the proposed policy, when trained in simulation, outperforms its real-world-trained version by 31 and the prior state-of-the-art methods by 50 points in navigation success rate. Policy generalization is verified by deploying the same model onboard a drone. Our results highlight the importance of diverse image encoder pretraining for sim-to-real generalization, and identify on-policy learning as a key advantage of simulated training over training with real data. Code, model checkpoints and multimedia materials are available at https://lasuomela.github.io/faint/
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Goal Navigation | HM3D | Success Rate60.3 | 55 | |
| Backward Visual Navigation (To Start) | Gibson | SR13.2 | 48 | |
| Backward Visual Navigation (To Start) | HM3D | SR11.3 | 48 | |
| Forward Visual Navigation (To End) | Gibson | SR50.7 | 48 | |
| Any-Point Visual Navigation | Gibson | SR52 | 24 | |
| Any-Point Visual Navigation | HM3D | SR36.3 | 24 | |
| Forward Visual Navigation (To End) | HM3D | Success Rate (SR)60.3 | 24 |