| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PRISM (test) | EXACT | Accuracy66.62 | 51 | 1mo ago | |
| PPE Preference (test) | Skywork-Reward-V2-Llama-3.1-8B-40M | Preference Score79.8 | 24 | 1mo ago | |
| JudgeBench | Positional Consistent Accuracy63.9 | 10 | 3d ago | ||
| RewardBench 2 | Accuracy73.9 | 10 | 3d ago | ||
| RM-Bench | C2 | Accuracy87.8 | 10 | 3d ago | |
| RewardBench | C2 | Accuracy91.8 | 10 | 3d ago | |
| Pets | SPL | Accuracy100 | 8 | 1mo ago | |
| UltraFeedback 500 held-out users (test) | RFM(32) | Test Accuracy70.53 | 7 | 1mo ago | |
| Meta-World Pick-Place (Novel Task) | ReCouPLe-EC | Reward Accuracy66.3 | 4 | 1mo ago | |
| Meta-World Pick-Place-Wall (train) | ReCouPLe-IC | Reward Accuracy65.7 | 4 | 1mo ago | |
| Meta-World Push-Wall (train) | RFP | Reward Accuracy90 | 4 | 1mo ago | |
| Meta-World Push (train) | ReCouPLe-IC | Reward Accuracy89.3 | 4 | 1mo ago | |
| All Datasets Total | WIMHF | Significant Features Count (S)43 | 2 | 5d ago | |
| WIMHF | Number of Significant Features (S)10 | 2 | 5d ago | ||
| PKU | WIMHF | Significant Features Count (S)8 | 2 | 5d ago | |
| HH-RLHF | WIMHF | Count of Significant Features (S)9 | 2 | 5d ago | |
| CA | WIMHF | Number of Significant Features (S)9 | 2 | 5d ago | |
| Arena | WIMHF | Count of Significant Features (S)7 | 2 | 5d ago |