Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference evaluation on Anthropic-SafeRLHF benchmark
Loading...
33.7
Win Rate
πbias (rubric-based preference attack)
23.508
26.154
28.8
31.446
Feb 14, 2026
Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate
πbias (rubric-based preference attack)
Comparison=πbias vs. π...
2026.02
33.7
πbias (rubric-based preference attack)
Comparison=πbias vs. π...
2026.02
23.9
Feedback
Search any
task
Search any
task