Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

About

The Reinforcement Learning (RL) building blocks, i.e. Q-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the Q-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables' underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).

Shai Keynan, Elad Sarafian, Sarit Kraus• 2021

Related benchmarks

TaskDatasetResultRank
Meta-Reinforcement LearningWalker2d params
FLOPs (k)5.64
3
Meta-Reinforcement LearningHopper params
FLOPs (k)4.1
3
Meta-Reinforcement LearningInvDoublePend params
FLOPs (k)3.59e+3
3
Meta-Reinforcement LearningCartpole fl-ood
FLOPs (k)1.79
3
Meta-Reinforcement LearningLunarlander g
FLOPs (k)3.08
3
Meta-Reinforcement LearningCheetah vel-ood
FLOPs (k)7.18
3
Showing 6 of 6 rows

Other info

Follow for update