Debiased Model-based Representations for Sample-efficient Continuous Control
About
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continuous Control | MuJoCo Ant v4 | Average Return8.14e+3 | 46 | |
| Continuous Control | MuJoCo Walker2d v4 | -- | 39 | |
| Continuous Control | MuJoCo HalfCheetah v4 | Average Return1.48e+4 | 36 | |
| Continuous Control | Gym MuJoCo Humanoid v4 | Average Return1.12e+4 | 15 | |
| Continuous Control | Gym MuJoCo Suite Aggregate | IQM1.691 | 15 | |
| Continuous Control | Gym MuJoCo Hopper v4 | Average Return2.50e+3 | 15 | |
| Continuous Control | DMC Suite Hard v1 (test) | Dog Run Return721 | 12 | |
| Continuous Control | HumanoidBench (w/ Hand) | Return (Slide)285 | 12 | |
| Continuous Control | DMC Easy | Acrobot Swingup Return569 | 9 | |
| Continuous Control | HumanoidBench without dexterous hands | Pole Score887 | 8 |