Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling Evaluation on Reward Bench Ties 2
Loading...
91.8
Pairwise Accuracy
Distribution-Calibrated Aggregation
76.72
80.635
84.55
88.465
Dec 2, 2025
Pairwise Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy
Distribution-Calibrated Aggregation
n=12, Judge LLM=gemini...
2025.12
91.8
Distribution-Calibrated Aggregation
n=4, Judge LLM=gemini-...
2025.12
90.5
SC
n=4, Judge LLM=gemini-...
2025.12
84.4
SC
n=12, Judge LLM=gemini...
2025.12
84.2
CI-SC
n=12, Judge LLM=gemini...
2025.12
83.4
Soft-SC
n=4, Judge LLM=gemini-...
2025.12
82.3
Soft-SC
n=12, Judge LLM=gemini...
2025.12
82.2
CI-SC
n=4, Judge LLM=gemini-...
2025.12
82.2
GSC
n=12, Judge LLM=gemini...
2025.12
80.4
GSC
n=4, Judge LLM=gemini-...
2025.12
79.2
USC
n=12, Judge LLM=gemini...
2025.12
77.9
USC
n=4, Judge LLM=gemini-...
2025.12
77.3
Feedback
Search any
task
Search any
task