Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safe Reinforcement Learning on SafetyPointPush2 v0
Loading...
0.6
Mean Reward
PPO-LAG
-0.8248
-0.4549
-0.085
0.2849
Apr 3, 2026
Mean Reward
Std Reward
Mean Cost
Std Cost
Updated 12d ago
Evaluation Results
Method
Method
Links
Mean Reward
Std Reward
Mean Cost
Std Cost
PPO-LAG
Time Mean (s)=4.09
2026.04
0.6
1.58
31.34
58.17
PPO
Time Mean (s)=3.78
2026.04
0.41
3.11
59.86
120.18
TRPO
Time Mean (s)=4.03
2026.04
0.2
2.5
106.88
216.19
PPO-FAB
Time Mean (s)=3.71
2026.04
-0.07
0.52
7.7
37.51
TRPO-LAG
Time Mean (s)=4.05
2026.04
-0.77
5.51
28.22
67.21
Feedback
Search any
task
Search any
task