| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| RewardBench 2 | Scalar RM | Factuality88.2 | 21 | 2d ago | |
| RewardBench | Spearman ρ0.542 | 20 | 16d ago | ||
| RewardBench | SAVE | Accuracy93.9 | 12 | 2d ago | |
| Reward Bench 2 (test) | Distribution-Calibrated Aggregation | RB2 Factuality MAE0.451 | 12 | 3mo ago | |
| RewardBench (test) | Consistency | Kuiper1.65 | 8 | 3mo ago | |
| RewardBench | DPO | P-value0.001 | 7 | 21d ago | |
| Meta-World Object OOD | FLORA | Process Alignment Correlation (ρ)0.81 | 5 | 12d ago | |
| Meta-World Viewpoint OOD | FLORA | Process Alignment ρ0.88 | 5 | 12d ago | |
| Meta-World Position OOD | FLORA | Process Alignment ρ0.85 | 5 | 12d ago | |
| Meta-World (train) | FLORA | Procedural Alignment Correlation (ρ)0.97 | 5 | 12d ago | |
| Arena-Hard RU | Qwen3-32B-RM | Best@8 Score92.69 | 5 | 3mo ago |