| RewardBench | FsfairX-LLaMA3-RM-8B | Chat Score99.4 | | 216 | 7d ago |
| RewardBench | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy97.8 | | 166 | 1mo ago |
| RM-Bench | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy96 | | 137 | 1d ago |
| RMB | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy89.3 | | 120 | 1mo ago |
| JudgeBench | OpenRS | Accuracy93.3 | | 117 | 1d ago |
| RewardBench v1.0 (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Average Score0.978 | | 89 | 3mo ago |
| RewardBench Focus 2 | Rubric-ARM-voting@5 | Accuracy90.3 | | 82 | 3mo ago |
| RewardBench v2 | | Accuracy92.1 | | 72 | 3mo ago |
| PPE-Preference | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy79.8 | | 72 | 1d ago |
| RewardBench Precise IF 2 | | Accuracy57.5 | | 70 | 3mo ago |
| RewardBench v2 (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Average Score86.5 | | 67 | 1mo ago |
| HelpSteer (test) | ILDE | MAE0.077 | | 65 | 25d ago |
| RM-Bench (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Overall Score96 | | 63 | 3mo ago |
| HelpSteer 3 | | Accuracy83.15 | | 62 | 4d ago |
| RewardBench Average 2 | FLIP | Accuracy39.7 | | 52 | 3mo ago |
| RewardBench Math 2 | Pointwise Rating | Accuracy35.7 | | 52 | 3mo ago |
| RM Bench Code | Skywork-Reward-Gemma-2-27B | EF0.154 | | 52 | 3mo ago |
| Reward Bench Math | internlm2-20b-reward | EF0.305 | | 52 | 3mo ago |
| Aggregate of 7 benchmarks (HelpSteer3, Reward Bench V2, SCAN-HPD, HREF, LitBench, WQ_Arena, WPB) | | Overall Accuracy74.56 | | 45 | 3mo ago |
| PPE Correctness | SAVE | Accuracy71.2 | | 45 | 1d ago |
| RM-Bench Chat | | Accuracy78.5 | | 42 | 4d ago |
| RewardBench Chat | | Accuracy96.4 | | 42 | 4d ago |
| RewardBench 2 | Qwen3.5-35B-A3B w/ Hybrid Reward | Precise IF Score71 | | 41 | 4d ago |
| RewardBench (full) | HyRe (best weight oracle)* + Skywork-Llama-3.1-8B | Chat Score99.2 | | 41 | 14d ago |
| PPE Correlation | Skywork-Reward-V2-Llama-3.1-8B-40M | Correlation87.2 | | 40 | 3mo ago |