| RewardBench | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy97.8 | | 166 | 8d ago |
| RewardBench | FsfairX-LLaMA3-RM-8B | Chat Score99.4 | | 146 | 2d ago |
| RM-Bench | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy96 | | 125 | 8d ago |
| RMB | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy89.3 | | 120 | 8d ago |
| JudgeBench | OpenRS | Accuracy93.3 | | 105 | 8d ago |
| RewardBench v1.0 (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Average Score0.978 | | 89 | 1mo ago |
| RewardBench Focus 2 | Rubric-ARM-voting@5 | Accuracy90.3 | | 82 | 1mo ago |
| RewardBench v2 | | Accuracy92.1 | | 72 | 1mo ago |
| RewardBench Precise IF 2 | | Accuracy57.5 | | 70 | 1mo ago |
| RM-Bench (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Overall Score96 | | 63 | 1mo ago |
| PPE-Preference | Skywork-Reward-V2-Llama-3.1-8B-40M | Accuracy79.8 | | 60 | 1mo ago |
| RewardBench Average 2 | FLIP | Accuracy39.7 | | 52 | 1mo ago |
| RewardBench Math 2 | Pointwise Rating | Accuracy35.7 | | 52 | 1mo ago |
| RM Bench Code | Skywork-Reward-Gemma-2-27B | EF0.154 | | 52 | 1mo ago |
| Reward Bench Math | internlm2-20b-reward | EF0.305 | | 52 | 1mo ago |
| HelpSteer (test) | | MAE0.2428 | | 48 | 24d ago |
| Aggregate of 7 benchmarks (HelpSteer3, Reward Bench V2, SCAN-HPD, HREF, LitBench, WQ_Arena, WPB) | | Overall Accuracy74.56 | | 45 | 1mo ago |
| RewardBench v2 (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Average Score86.5 | | 42 | 1mo ago |
| PPE Correlation | Skywork-Reward-V2-Llama-3.1-8B-40M | Correlation87.2 | | 40 | 1mo ago |
| Unified Feedback (UF) | GRM-SFT | Accuracy78.9 | | 40 | 1mo ago |
| JudgeBench (test) | Qwen3-30B-A3B | Overall82 | | 40 | 1mo ago |
| HelpSteer 3 | | Accuracy83.15 | | 39 | 1mo ago |
| LitBench | AC-GenRM | Accuracy80.7 | | 36 | 10d ago |
| RM-Bench Chat Hard | | Accuracy83.3 | | 34 | 1mo ago |
| PPE Correctness | OPRM-RgFT-Qwen2.5-32B | Accuracy67.3 | | 33 | 1mo ago |