Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Safety-constrained Reinforcement Learning on Safety-Gym SafetyPointGoal1 (evaluation)

11.37Average Reward

PPO

-1.2142.0535.328.587May 14, 2026
Updated 19d ago

Evaluation Results

MethodLinks
2026.05
11.37109.8714.3950.103
2026.05
8.1490.270.8663.6110.09
2026.05
5.3936.930.4741.4770.146
2026.05
2.99195.330.3187.8130.015
2026.05
2.99195.330.3187.8130.015
2026.05
2.341.670.2451.6670.055
2026.05
2.26309.330.24112.3730.007
2026.05
0.68110.670.0724.4270.006
2026.05
-0.1225.67-0.0131.027-0.005
2026.05
-0.71101.5-0.0634.06-0.007
2026.05
-0.735.67-0.0770.227-0.129