Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

About

In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.

Prajwal Koirala, Zhanhong Jiang, Soumik Sarkar, Cody Fleming• 2024

Related benchmarks

TaskDatasetResultRank
Safe Reinforcement LearningBullet Safety Gym
Normalized Reward0.52
10
Safe Reinforcement LearningMetaDrive
Normalized Reward0.18
10
Reinforcement LearningSafety Gym HalfCheetahVel
Reward0.97
6
Reinforcement LearningBullet Safety Gym CarCircle
Reward0.72
6
Reinforcement LearningBullet Safety Gym DroneCircle
Reward0.58
6
Reinforcement LearningSafety Gym HopperVel
Reward0.69
6
Reinforcement LearningBullet Safety Gym CarRun
Reward0.97
6
Reinforcement LearningBullet Safety Gym AntCircle
Reward0.45
6
Reinforcement LearningSafety Gym Walker2dVel
Reward0.76
6
Reinforcement LearningSafety Gym AntVel
Reward0.98
6
Showing 10 of 14 rows

Other info

Follow for update