Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Offline Constrained RLHF on PKU-SafeRLHF 7k (test)
Loading...
-1.6044
Safety Reward (r1)
πλ
-1.68462
-1.64451
-1.6044
-1.56429
Mar 31, 2026
Safety Reward (r1)
Helpfulness Reward (r2)
Violation Rate
Updated 17d ago
Evaluation Results
Method
Method
Links
Safety Reward (r1)
Helpfulness Reward (r2)
Violation Rate
πλ
Policy Type=Dual-only...
2026.03
-1.6044
-0.6287
0
Feedback
Search any
task
Search any
task