Continual Model-Based Reinforcement Learning with Hypernetworks
About
Effective planning in model-based reinforcement learning (MBRL) and model-predictive control (MPC) relies on the accuracy of the learned dynamics model. In many instances of MBRL and MPC, this model is assumed to be stationary and is periodically re-trained from scratch on state transition experience collected from the beginning of environment interactions. This implies that the time required to train the dynamics model - and the pause required between plan executions - grows linearly with the size of the collected experience. We argue that this is too slow for lifelong robot learning and propose HyperCRL, a method that continually learns the encountered dynamics in a sequence of tasks using task-conditional hypernetworks. Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience; second, it uses fixed-capacity hypernetworks to represent non-stationary and task-aware dynamics; third, it outperforms existing continual learning alternatives that rely on fixed-capacity networks, and does competitively with baselines that remember an ever increasing coreset of past experience. We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios, such as tasks involving pushing and door opening. Our project website with videos is at this link https://rvl.cs.toronto.edu/blog/hypercrl
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | Pusher | Average Returns107 | 16 | |
| Reinforcement Learning | Block Sliding | Retention (Task 1)82 | 12 | |
| Block Sliding | Block Sliding | Task 2 Forward Transfer92 | 6 | |
| Door Opening | door | Task 2 Score106 | 6 | |
| Performance Retention | Door Environment (test) | Task 1 Retention (%)113 | 6 | |
| Reinforcement Learning | door | Task 1 Score113 | 6 | |
| Forward Transfer | Pusher Task 2 | Forward Transfer Reward127 | 6 | |
| Forward Transfer | Pusher Task 5 | Forward Transfer Task Reward107 | 6 | |
| Forward Transfer | Pusher Task 3 | Forward Transfer (%)99 | 6 | |
| Forward Transfer | Pusher Task 4 | Forward Transfer Reward (%)94 | 6 |