Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

About

Randomized ensembled double Q-learning (REDQ) (Chen et al., 2021b) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is made possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC) (Haarnoja et al., 2018a). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called DroQ, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that DroQ is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ, much better computational efficiency than REDQ, and comparable computational efficiency with that of SAC.

Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi, Yoshimasa Tsuruoka• 2021

Related benchmarks

TaskDatasetResultRank
TractographyISMRM in silico 2015
VC (%)84.8
11
6-DOF Helix Trajectory TrackingBlueROV2 Heavy Centre Locked Helix Experiment 1.0 (real-world deployment)
Positional Error X (m)0.088
4
Disturbance RejectionDisturbance rejection experiments
Positional Error X0.17
4
Showing 3 of 3 rows

Other info

Follow for update