Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

About

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov• 2020

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Walker2d v4	--	51
Continuous Control	MuJoCo Ant v4	Average Return3.58e+3	46
Continuous Control	MuJoCo HalfCheetah v4	Average Return1.23e+4	36
Continuous Control	Walker2D v5	Avg Return5.80e+3	17
Continuous Control	Hopper v5	Average Return3.70e+3	15
Continuous Control	Gym MuJoCo Hopper v4	Average Return3.53e+3	15
Continuous Control	Gym MuJoCo Suite Aggregate	IQM1.143	15
Portfolio Trading	Mainstream Tech & Market-Index Portfolio FinRL 6-asset extended test horizon (out-of-sample)	Compound Return (CR)145.1	15
Continuous Control	Gym MuJoCo Humanoid v4	Average Return6.03e+3	15
Continuous Control	MuJoCo Humanoid v5	Maximum Average Return6.33e+3	13

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord