Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation
About
We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actor-critic methods in continuous robotics domains. This approach builds on the recently released ARM algorithm, which replaces the continuous next-best pose agent with a discrete one, with coarse-to-fine Q-attention. Given a voxelised scene, coarse-to-fine Q-attention learns what part of the scene to 'zoom' into. When this 'zooming' behaviour is applied iteratively, it results in a near-lossless discretisation of the translation space, and allows the use of a discrete action, deep Q-learning method. We show that our new coarse-to-fine algorithm achieves state-of-the-art performance on several difficult sparsely rewarded RLBench vision-based robotics tasks, and can train real-world policies, tabula rasa, in a matter of minutes, with as little as 3 demonstrations.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | RLBench | Avg Success Score20.1 | 56 | |
| Robotic Manipulation | RLBench (test) | Average Success Rate20.1 | 34 | |
| Multi-task Robotic Manipulation | RLBench | Avg Success Rate16.9 | 16 | |
| drag stick | RLBench | Success Rate72 | 10 | |
| close jar | RLBench | Success Rate28 | 10 | |
| open drawer | RLBench | Success Rate28 | 10 | |
| slide block | RLBench | Success Rate16 | 10 | |
| stack blocks | RLBench | Success Rate4 | 10 | |
| turn tap | RLBench | Success Rate68 | 10 | |
| meat off grill | RLBench | Success Rate40 | 10 |