Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Evaluation on Anthropic-SafeRLHF
Loading...
41.7
Win Rate
πbias (rubric-based preference attack)
33.796
35.848
37.9
39.952
Feb 14, 2026
Win Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Win Rate
πbias (rubric-based preference attack)
Comparison=πbias vs. π...
2026.02
41.7
πbias (rubric-based preference attack)
Comparison=πbias vs. π...
2026.02
34.1
Feedback
Search any
task
Search any
task