| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RewardBench | CCE@16 | Accuracy93.7 | 29 | 1mo ago | |
| EvalBias | CCE@16 | Accuracy85.9 | 16 | 1mo ago | |
| JudgeBench | CCE@16 | Accuracy75.7 | 16 | 1mo ago | |
| MTBench Human | CCE@16 | Accuracy88.9 | 16 | 1mo ago | |
| HelpSteer2 | Vanilla | Accuracy72.3 | 16 | 1mo ago |