Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards Safe Reinforcement Learning via Constraining Conditional Value-at-Risk

About

Though deep reinforcement learning (DRL) has obtained substantial success, it may encounter catastrophic failures due to the intrinsic uncertainty of both transition and observation. Most of the existing methods for safe reinforcement learning can only handle transition disturbance or observation disturbance since these two kinds of disturbance affect different parts of the agent; besides, the popular worst-case return may lead to overly pessimistic policies. To address these issues, we first theoretically prove that the performance degradation under transition disturbance and observation disturbance depends on a novel metric of Value Function Range (VFR), which corresponds to the gap in the value function between the best state and the worst state. Based on the analysis, we adopt conditional value-at-risk (CVaR) as an assessment of risk and propose a novel reinforcement learning algorithm of CVaR-Proximal-Policy-Optimization (CPPO) which formalizes the risk-sensitive constrained optimization problem by keeping its CVaR under a given threshold. Experimental results show that CPPO achieves a higher cumulative reward and is more robust against both observation and transition disturbances on a series of continuous control tasks in MuJoCo.

Chengyang Ying, Xinning Zhou, Hang Su, Dong Yan, Ning Chen, Jun Zhu• 2022

Related benchmarks

TaskDatasetResultRank
3D Bin Packing3D-BPP discrete setting (test)
Space Utilization75.5
20
TrackingSafe-Control-Gym Cartpole Track Action Uncertainty
Avg Return113
7
TrackingSafe-Control-Gym Cartpole Track Observation Uncertainty
Average Return87
7
TrackingSafe-Control-Gym Cartpole Track Dynamics Uncertainty
Average Return106
7
StabilizationSafe-Control-Gym Cartpole Stab Observation Uncertainty
Average Return41
7
StabilizationSafe-Control-Gym Cartpole Stab Dynamics Uncertainty
Average Return77
7
TrackingSafe-Control-Gym Quadrotor Track Action Uncertainty
Average Return76
7
StabilizationSafe-Control-Gym Cartpole Stab Action Uncertainty
Average Return76
7
StabilizationSafe-Control-Gym Quadrotor Stab Action Uncertainty
Average Return54
7
TrackingSafe-Control-Gym Quadrotor Track Observation Uncertainty
Average Return152
7
Showing 10 of 13 rows

Other info

Follow for update