Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

About

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov• 2020

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Ant v4
Average Return3.58e+3
46
Continuous ControlMuJoCo Walker2d v4--
39
Continuous ControlMuJoCo HalfCheetah v4
Average Return1.23e+4
36
Continuous ControlWalker2D v5
Avg Return5.80e+3
17
Continuous ControlHopper v5
Average Return3.70e+3
15
Continuous ControlGym MuJoCo Hopper v4
Average Return3.53e+3
15
Continuous ControlGym MuJoCo Suite Aggregate
IQM1.143
15
Continuous ControlGym MuJoCo Humanoid v4
Average Return6.03e+3
15
Continuous ControlMuJoCo Humanoid v5
Maximum Average Return6.33e+3
13
Continuous ControlHumanoid v5
Average Return5.27e+3
13
Showing 10 of 22 rows

Other info

Follow for update