Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

About

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.

Rui Yang, Chenjia Bai, Xiaoteng Ma, Zhaoran Wang, Chongjie Zhang, Lei Han• 2022

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL walker2d-random
Normalized Score21.4
77
Offline Reinforcement LearningD4RL Gym walker2d (medium-replay)
Normalized Return90.4
52
Offline Reinforcement LearningD4RL Gym halfcheetah-medium
Normalized Return66.8
44
Offline Reinforcement LearningD4RL Gym walker2d medium
Normalized Return102.4
42
Offline Reinforcement Learningantmaze medium-play
Score76.3
35
Offline Reinforcement LearningD4RL Locomotion medium, medium-replay, medium-expert v2
Score (HalfCheetah, Medium)66.8
34
Hand ManipulationAdroit door-human
Normalized Avg Score3.78
33
Offline Reinforcement LearningD4RL Gym walker2d medium-expert
Normalized Average Return121.2
31
Offline Reinforcement LearningD4RL Gym hopper-medium-expert
Normalized Avg Return112.7
29
Offline Reinforcement LearningD4RL Gym halfcheetah-medium-expert
Normalized Return107.8
28
Showing 10 of 38 rows

Other info

Code

Follow for update