| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Multimodal RewardBench | Gemini-2.5-Pro | Accuracy85.4 | 17 | 4d ago | |
| VL-RewardBench | EGT | Accuracy77.15 | 17 | 4d ago | |
| VL-RewardBench, Multimodal RewardBench, and MM-RLHF-RewardBench Aggregate | EGT | Accuracy82.44 | 9 | 4d ago | |
| MM-RLHF-RewardBench | EGT | Accuracy85.88 | 9 | 4d ago | |
| PhyCritic-Bench | Gemini-2.5-Pro | Overall Score78.2 | 8 | 4d ago | |
| RewardBench 2 | SW-RM-V2-LLaMA3.1-8B | Safety Score96.7 | 5 | 4d ago | |
| UniReward In-Domain (test) | UniRM | Quality Score99.3 | 5 | 4d ago |