Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Reward Modeling Evaluation on Reward Bench Safety 2
Loading...
72.3
Pairwise Accuracy
Distribution-Calibrated Aggregation
61.38
64.215
67.05
69.885
Dec 2, 2025
Pairwise Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Pairwise Accuracy
Distribution-Calibrated Aggregation
n=12, Judge LLM=gemini...
2025.12
72.3
Distribution-Calibrated Aggregation
n=4, Judge LLM=gemini-...
2025.12
69.1
SC
n=4, Judge LLM=gemini-...
2025.12
65
Soft-SC
n=4, Judge LLM=gemini-...
2025.12
63.5
Soft-SC
n=12, Judge LLM=gemini...
2025.12
63.3
SC
n=12, Judge LLM=gemini...
2025.12
63
CI-SC
n=12, Judge LLM=gemini...
2025.12
62.9
CI-SC
n=4, Judge LLM=gemini-...
2025.12
62.6
GSC
n=4, Judge LLM=gemini-...
2025.12
62.5
USC
n=12, Judge LLM=gemini...
2025.12
62.3
USC
n=4, Judge LLM=gemini-...
2025.12
61.9
GSC
n=12, Judge LLM=gemini...
2025.12
61.8
Feedback
Search any
task
Search any
task