| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| PRISM | CUMA | Win-Rate (DPO)74.5 | 20 | 4d ago | |
| Psoups (test) | MetaAligner | Helpfulness (RM)1.39 | 13 | 4d ago | |
| Ultrafeedback 40% flipping ratio | FA-DPO | Accuracy78.87 | 12 | 4d ago | |
| Ultrafeedback 20% flipping ratio | FA-DPO | Accuracy78.8 | 12 | 4d ago | |
| AlpacaEval weighted gpt4 turbo 2.0 | GANPO (SimPO) | Win Rate46.11 | 8 | 4d ago | |
| Board Game Playtesting Dataset | MeepleLM | MAE0.6576 | 8 | 4d ago | |
| CSQA | Pep | Preference Alignment78.2 | 5 | 4d ago | |
| SocialIQA | Pep | Preference Alignment87.3 | 5 | 4d ago | |
| AIME | Pep | Preference Alignment80.1 | 5 | 4d ago | |
| MedQA | Pep | Preference Alignment77.4 | 5 | 4d ago | |
| Argilla-7k (test) | MixDPO | LC Win Rate9.23 | 5 | 4d ago | |
| PRISM 1.0 (test) | Hard Panel | Borda Average2.393 | 5 | 4d ago | |
| PRISM normalized-step (test) | Hard Panel | Borda Avg2.328 | 5 | 4d ago | |
| 15,000 listwise rankings (test) | Hard Panel | BT Score0.384 | 5 | 4d ago | |
| PRISM 1.0 (full) | Hard Panel | Borda Avg Score2.459 | 5 | 4d ago | |
| PKU-SafeRLHF (test) | Qwen-1.7B-DPO | Win Rate28.69 | 3 | 4d ago | |
| AlpacaEval | - | - | 0 | 4d ago |