Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Preference Learning on Anthropic HH-RLHF+VI Preference (test)
Loading...
64
Overall Accuracy
MC-STL
58.8
60.15
61.5
62.85
Jan 10, 2026
Overall Accuracy
Group Accuracy
Calibration Slope
Calibration Bias
Item (1-EMD)
Updated 4d ago
Evaluation Results
Method
Method
Links
Overall Accuracy
Group Accuracy
Calibration Slope
Calibration Bias
Item (1-EMD)
MC-STL
2026.01
64
64
1.02
-0.01
0.61
phi
2026.01
60
59
0.94
0.03
0.58
Maj(Y)
2026.01
59
56
0.94
0.02
0.58
Feedback
Search any
task
Search any
task