Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Constrained Decision Transformer for Offline Safe Reinforcement Learning

About

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints. The code is available at https://github.com/liuzuxin/OSRL.

Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao• 2023

Related benchmarks

TaskDatasetResultRank
Auto-biddingAuctionNet
Score437.7
90
Auto-biddingAuctionNet-Sparse
Score44.56
45
Safe Reinforcement LearningBullet Safety Gym
Normalized Reward0.61
10
Safe Reinforcement LearningMetaDrive
Normalized Reward0.4
10
DroneRunBullet-Safety-Gym OSRL
Reward0.84
9
CarRunBullet-Safety-Gym OSRL
Reward0.96
9
CarCircleBullet-Safety-Gym OSRL
Reward0.71
9
BallCircleBullet-Safety-Gym OSRL
Reward0.73
9
BallRunBullet-Safety-Gym OSRL
Reward0.35
9
Constrained BiddingAuctionNet
Value357.4
9
Showing 10 of 19 rows

Other info

Follow for update