Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safe Reinforcement Learning on SafetyPointGoal2 v0
Loading...
15.58
Mean Reward
TRPO
-1.788
2.721
7.23
11.739
Apr 3, 2026
Mean Reward
Std Dev of Reward
Mean Cost
Std Dev of Cost
Updated 12d ago
Evaluation Results
Method
Method
Links
Mean Reward
Std Dev of Reward
Mean Cost
Std Dev of Cost
TRPO
Time Mean (s)=2.59
2026.04
15.58
10.31
164.14
88.43
PPO
Time Mean (s)=2.47
2026.04
13.26
14.05
167.46
87.06
TRPO-LAG
Time Mean (s)=2.78
2026.04
2.37
8.46
89.04
187.67
PPO-LAG
Time Mean (s)=2.40
2026.04
2.24
5.1
54.1
64.5
PPO-FAB
Time Mean (s)=2.75
2026.04
-1.12
1.11
24.32
37.84
Feedback
Search any
task
Search any
task