| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RewardBench 2 | LMUNITQwen2.5-72B | Factuality87.2 | 13 | 1mo ago | |
| Reward Bench 2 (test) | Distribution-Calibrated Aggregation | RB2 Factuality MAE0.451 | 12 | 1mo ago | |
| RewardBench (test) | Consistency | Kuiper1.65 | 8 | 1mo ago | |
| Arena-Hard RU | Qwen3-32B-RM | Best@8 Score92.69 | 5 | 1mo ago |