| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| TL;DR (test) | CW-rDPO | Win Rate68.8 | 36 | 2mo ago | |
| HH-RLHF (test) | CW-IPO | Win Rate87.4 | 36 | 1d ago | |
| HH-RLHF | ASR99.4 | 32 | 1mo ago | ||
| UF-P-4 | SPL | Accuracy (%)62.46 | 20 | 2mo ago | |
| UF-P 2 | SPL | Accuracy63.71 | 20 | 2mo ago | |
| PRISM | CUMA | Win-Rate (DPO)74.5 | 20 | 3mo ago | |
| UFB | CW-DPO | Win Rate83.2 | 18 | 2mo ago | |
| UFB (test) | CW-DPO | Win Rate81.05 | 18 | 2mo ago | |
| UltraFeedback | CLIPer (Vanilla Baseline) | Win Rate81 | 16 | 21d ago | |
| Koala | CLIPer | Wins (Count)196 | 14 | 22d ago | |
| AlignX UGC | Accuracy58.76 | 14 | 1mo ago | ||
| AlignX PAIR | Accuracy59.78 | 14 | 1mo ago | ||
| AlignX (DEMO) | Accuracy92.51 | 14 | 1mo ago | ||
| AlignX (Arbitrary) | Accuracy74.6 | 14 | 1mo ago | ||
| Psoups (test) | MetaAligner | Helpfulness (RM)1.39 | 13 | 3mo ago | |
| Anthropic-hh-rlhf (test) | PLC | LLM-as-a-Judge Helpful Score5.83 | 12 | 1mo ago | |
| AlpacaEval | AdaBoN | Win Rate52 | 12 | 2mo ago | |
| Ultrafeedback 40% flipping ratio | FA-DPO | Accuracy78.87 | 12 | 3mo ago | |
| Ultrafeedback 20% flipping ratio | FA-DPO | Accuracy78.8 | 12 | 3mo ago | |
| UltraFeedback (test) | FedPDPO | Accuracy74.18 | 11 | 2mo ago | |
| PyDPO (test) | FedPDPO | Accuracy94.32 | 11 | 2mo ago | |
| WebGPT (test) | FedPDPO | Accuracy61.24 | 11 | 2mo ago | |
| HH and UF Out-of-Domain (test) | TPMM-DPO | OOD Win Rate58.7 | 10 | 8d ago | |
| HH and UF In-Domain (test) | TPMM-DPO | Win Rate68.4 | 10 | 8d ago | |
| HelpSteer3 | DPO+Filter | Score-5.89 | 10 | 21d ago |