| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HH Golden 42.5k (test) | FA-DPO | Win Rate88.9 | 30 | 3mo ago | |
| Ultrafeedback 61.1k (test) | FA-DPO | Win Rate69.8 | 30 | 3mo ago | |
| PRBench | RUDE | Pearson r0.8 | 1 | 21d ago | |
| WritingBench | RUDE | Pearson r0.62 | 1 | 21d ago | |
| HealthBench | RUDE | Pearson r0.67 | 1 | 21d ago | |
| AdvancedIF | RUDE | Pearson r0.91 | 1 | 21d ago |