Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Alignment on 15,000 listwise rankings (test)
Loading...
0.384
BT Score
Hard Panel
-0.46464
-0.24432
-0.024
0.19632
Feb 4, 2026
BT Score
PL Score
Updated 4d ago
Evaluation Results
Method
Method
Links
BT Score
PL Score
Hard Panel
backbone=Llama-3.1-8B,...
2026.02
0.384
0.375
US-Rep
backbone=Llama-3.1-8B,...
2026.02
0.102
0.113
Soft Panel
backbone=Llama-3.1-8B,...
2026.02
0.017
0.071
Full PRISM
backbone=Llama-3.1-8B,...
2026.02
-0.07
0.026
Base
backbone=Llama-3.1-8B,...
2026.02
-0.432
-0.584
Feedback
Search any
task
Search any
task