Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Constrained Decision Transformer for Offline Safe Reinforcement Learning

About

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the environment. We aim to tackle a more challenging problem: learning a safe policy from an offline dataset. We study the offline safe RL problem from a novel multi-objective optimization perspective and propose the $\epsilon$-reducible concept to characterize problem difficulties. The inherent trade-offs between safety and task performance inspire us to propose the constrained decision transformer (CDT) approach, which can dynamically adjust the trade-offs during deployment. Extensive experiments show the advantages of the proposed method in learning an adaptive, safe, robust, and high-reward policy. CDT outperforms its variants and strong offline safe RL baselines by a large margin with the same hyperparameters across all tasks, while keeping the zero-shot adaptation capability to different constraint thresholds, making our approach more suitable for real-world RL under constraints. The code is available at https://github.com/liuzuxin/OSRL.

Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao• 2023

Related benchmarks

TaskDatasetResultRank
Auto-biddingAuctionNet
Score437.7
90
Auto-biddingAuctionNet-Sparse
Score44.56
52
PointGoal2Safety Gymnasium
Normalized Reward59
21
PointButton2Safety Gymnasium
Normalized Reward46
21
PointGoal1Safety Gymnasium
Normalized Reward0.69
21
PointButton1Safety Gymnasium
Normalized Reward50
21
PointPush2Safety Gymnasium
Normalized Reward21
21
PointPush1Safety Gymnasium
Normalized Reward24
21
CarPush1Safety Gymnasium
Reward0.31
19
CarGoal2Safety Gymnasium
Reward0.48
19
Showing 10 of 60 rows

Other info

Follow for update