Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling Evaluation on Reward Bench Math 2
Loading...
72.3
Pairwise Accuracy
Distribution-Calibrated Aggregation
60.028
63.214
66.4
69.586
Dec 2, 2025
Pairwise Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy
Distribution-Calibrated Aggregation
n=12, Judge LLM=gemini...
2025.12
72.3
Distribution-Calibrated Aggregation
n=4, Judge LLM=gemini-...
2025.12
70.9
SC
n=4, Judge LLM=gemini-...
2025.12
65.8
Soft-SC
n=12, Judge LLM=gemini...
2025.12
65.4
SC
n=12, Judge LLM=gemini...
2025.12
63.5
CI-SC
n=12, Judge LLM=gemini...
2025.12
63.4
CI-SC
n=4, Judge LLM=gemini-...
2025.12
63.2
Soft-SC
n=4, Judge LLM=gemini-...
2025.12
62.6
USC
n=12, Judge LLM=gemini...
2025.12
61.9
USC
n=4, Judge LLM=gemini-...
2025.12
61.6
GSC
n=4, Judge LLM=gemini-...
2025.12
60.9
GSC
n=12, Judge LLM=gemini...
2025.12
60.5
Feedback
Search any
task
Search any
task