| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Alpaca Eval | AMA Reweighting | Alpaca Eval (%)17.77 | 22 | 1mo ago | |
| HHH | STAIR | Accuracy90.71 | 20 | 1mo ago | |
| GSM8k | DR-IRL | Accuracy89.7 | 20 | 1mo ago | |
| AdvGLUE | DR-IRL | Accuracy75.15 | 20 | 1mo ago | |
| SimpleQA | DR-IRL | Accuracy6.64 | 20 | 1mo ago | |
| PKU-SafeRLHF 30K | SafeDPO | Win Rate84.5 | 6 | 1mo ago | |
| GPT-4 Evaluation Template T2 (overall) | SafeDPO | Win Rate91.6 | 5 | 1mo ago | |
| Template T3 GPT-4 evaluation (test) | SafeDPO | Win Rate91.62 | 5 | 1mo ago | |
| HH-RLHF (test) | MAVIS | Reward2.542 | 4 | 1mo ago |