Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Debiased Model-based Representations for Sample-efficient Continuous Control

About

Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.

Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye• 2026

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant v4
Average Return8.14e+3
46
Continuous ControlMuJoCo Walker2d v4--
39
Continuous ControlMuJoCo HalfCheetah v4
Average Return1.48e+4
36
Continuous ControlGym MuJoCo Humanoid v4
Average Return1.12e+4
15
Continuous ControlGym MuJoCo Suite Aggregate
IQM1.691
15
Continuous ControlGym MuJoCo Hopper v4
Average Return2.50e+3
15
Continuous ControlDMC Suite Hard v1 (test)
Dog Run Return721
12
Continuous ControlHumanoidBench (w/ Hand)
Return (Slide)285
12
Continuous ControlDMC Easy
Acrobot Swingup Return569
9
Continuous ControlHumanoidBench without dexterous hands
Pole Score887
8
Showing 10 of 11 rows

Other info

GitHub

Follow for update