Rapid Exploration for Open-World Navigation with Latent Goal Models
About
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images. We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration. Trained on a large offline dataset of prior experience, the model acquires a representation of visual goals that is robust to task-irrelevant distractors. We demonstrate our method on a mobile ground robot in open-world exploration scenarios. Given an image of a goal that is up to 80 meters away, our method leverages its representation to explore and discover the goal in under 20 minutes, even amidst previously-unseen obstacles and weather conditions. Please check out the project website for videos of our experiments and information about the real-world dataset used at https://sites.google.com/view/recon-robot.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Offline Reinforcement Learning | D4RL Franka Kitchen | Mixed Success Rate81 | 43 | |
| Robotic Manipulation | D4RL Kitchen-Partial | Normalized Score92 | 23 | |
| Robotic Manipulation | D4RL Kitchen-Mixed | -- | 14 | |
| Goal-conditioned Reinforcement Learning | manipulation-cube-single-play (test) | Success Rate0.9 | 11 | |
| Goal-conditioned Reinforcement Learning | pointmaze navigate medium | Success Rate69 | 11 | |
| Offline goal-conditioned RL | OGBench Manipulation | Success Rate (Cube Single)90 | 9 | |
| Robotic Manipulation | D4RL kitchen-complete | Slide Cabinet Success Rate25 | 9 | |
| Offline goal-conditioned RL | OGBench Navigation | Success Rate (PointMaze-Medium)69 | 9 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | puzzle-3x3-play state-based v0 (test) | Success Rate14 | 6 | |
| Goal-Conditioned Reinforcement Learning (Manipulation) | scene-play state-based v0 (test) | Success Rate58 | 6 |