Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Constrained Update Projection Approach to Safe Policy Optimization

About

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at this link https://github.com/zmsn-2077/ CUP-safe-rl.

Long Yang, Jiaming Ji, Juntao Dai, Linrui Zhang, Binbin Zhou, Pengfei Li, Yaodong Yang, Gang Pan• 2022

Related benchmarks

TaskDatasetResultRank
Hopper VelocitySafety Gymnasium level-2
Safe Reward1.00e+3
12
Point PushSafety Gymnasium level-2
Safe Reward0.12
12
Car GoalSafety Gymnasium level-2
Safe Reward0.93
12
Point ButtonSafety Gymnasium level-2
Safe Reward0.08
12
Point GoalSafety Gymnasium level-2
Safe Reward0.95
12
Swimmer VelocitySafety Gymnasium level-2
Safe Reward34
12
Car PushSafety Gymnasium level-2
Safe Reward-0.0062
12
Car CircleSafety Gymnasium level-2
Safe Reward7.1
12
Safe Reinforcement LearningSpring Pendulum
Training Time (s)81.4
7
Safe Reinforcement LearningOPF with Battery Energy Storage
Training Time (s)239.4
7
Showing 10 of 16 rows

Other info

Follow for update