Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Offline Constrained RLHF on PKU-SafeRLHF 74k (train)
Loading...
-1.6034
Safety Expected Reward (r1)
πλ
-1.911552
-1.831551
-1.75155
-1.671549
Mar 31, 2026
Safety Expected Reward (r1)
Helpfulness Expected Reward (r2)
Violation Rate
Updated 17d ago
Evaluation Results
Method
Method
Links
Safety Expected Reward (r1)
Helpfulness Expected Reward (r2)
Violation Rate
πλ
Policy Type=Dual-only...
2026.03
-1.6034
-0.6479
0.06
π0
Policy Type=Reference...
2026.03
-1.8997
-0.8867
23.94
Feedback
Search any
task
Search any
task