COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration
About
Data efficiency and robustness to task-irrelevant perturbations are long-standing challenges for deep reinforcement learning algorithms. Here we introduce a modular approach to addressing these challenges in a continuous control environment, without using hand-crafted or supervised information. Our Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically motivated exploration and unsupervised learning to build object-based models of its environment and action space. Subsequently, it can learn a variety of tasks through model-based search in very few steps and excel on structured hold-out tests of policy robustness.
Nicholas Watters, Loic Matthey, Matko Bosnjak, Christopher P. Burgess, Alexander Lerchner• 2019
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Push and Switch | OpenAI Fetch - Push and Switch 3-Push + 3-Switch (S+O) (test) | Success Rate69.4 | 18 | |
| Object Comparison | Spriteworld (train) | Success Rate92.3 | 9 | |
| Property Comparison | Spriteworld (train) | Success Rate91.8 | 9 | |
| Push | OpenAI Fetch Push 2-Push (L) (test) | Success Rate96.4 | 9 | |
| Object Comparison | Spriteworld | Success Rate83.4 | 9 | |
| Object Comparison | Spriteworld unseen object numbers (test) | Avg Success Rate87.6 | 9 | |
| Property Comparison | Spriteworld | Success Rate80.5 | 9 | |
| Push and Switch | OpenAI Fetch - Push and Switch 2-Push + 2-Switch (L+S) (test) | Success Rate59.1 | 9 | |
| Switch | OpenAI Fetch 3-Switch (L+O) (test) | Success Rate83.5 | 9 | |
| Property Comparison | Spriteworld unseen object numbers (test) | Avg Success Rate86.5 | 9 |
Showing 10 of 23 rows