LexiSafe: Offline Safe Reinforcement Learning with Lexicographic Safety-Reward Hierarchy
About
Offline safe reinforcement learning (RL) is increasingly important for cyber-physical systems (CPS), where safety violations during training are unacceptable and only pre-collected data are available. Existing offline safe RL methods typically balance reward-safety tradeoffs through constraint relaxation or joint optimization, but they often lack structural mechanisms to prevent safety drift. We propose LexiSafe, a lexicographic offline RL framework designed to preserve safety-aligned behavior. We first develop LexiSafe-SC, a single-cost formulation for standard offline safe RL, and derive safety-violation and performance-suboptimality bounds that together yield sample-complexity guarantees. We then extend the framework to hierarchical safety requirements with LexiSafe-MC, which supports multiple safety costs and admits its own sample-complexity analysis. Empirically, LexiSafe demonstrates reduced safety violations and improved task performance compared to constrained offline baselines. By unifying lexicographic prioritization with structural bias, LexiSafe offers a practical and theoretically grounded approach for safety-critical CPS decision-making.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Reinforcement Learning | Safety Gym HopperVel | Reward0.7 | 6 | |
| Reinforcement Learning | Bullet Safety Gym CarRun | Reward0.98 | 6 | |
| Reinforcement Learning | Bullet Safety Gym BallCircle | Reward0.71 | 6 | |
| Reinforcement Learning | Bullet Safety Gym AntCircle | Reward0.51 | 6 | |
| Reinforcement Learning | Safety Gym HalfCheetahVel | Reward0.97 | 6 | |
| Reinforcement Learning | Safety Gym Walker2dVel | Reward0.78 | 6 | |
| Reinforcement Learning | Bullet Safety Gym AntRun | Reward0.65 | 6 | |
| Reinforcement Learning | Safety Gym SwimmerVel | Reward0.51 | 6 | |
| Reinforcement Learning | Bullet Safety Gym CarCircle | Reward0.71 | 6 | |
| Reinforcement Learning | Bullet Safety Gym DroneCircle | Reward0.51 | 6 |