| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| HH-RLHF (test) | RE-CONTROL + Prompting | Win Rate80.3 | 21 | 4d ago | |
| SHP | RE-CONTROL + Prompting | Diversity89.3 | 15 | 4d ago | |
| Alpaca, BeaverTails, and TruthfulQA (test) | AlignX | Win Rate97.1 | 12 | 4d ago | |
| Combined Suite Setup 3 | AMA Reweighting | Average Percentage Score54.38 | 9 | 4d ago | |
| UltraFeedback (in-domain) | GEB-π | Win Rate (KL, alpha=1)80.6 | 8 | 4d ago | |
| Honesty | AlignX | Truthfulness Index84 | 7 | 4d ago | |
| Harmlessness | AlignX | WR87.85 | 7 | 4d ago | |
| Helpfulness | AlignX | Truthfulness Index0.891 | 7 | 4d ago | |
| Base Model Evaluation Set | AlignX | Win Rate79.93 | 6 | 4d ago | |
| UltraFeedback 2023 (test) | MARS | Win-rate55 | 4 | 4d ago | |
| PKU-SafeRLHF 2024 (test) | MARS | Win Rate0.58 | 4 | 4d ago | |
| Anthropic HH-RLHF 2022 (test) | MARS | Win Rate62 | 4 | 4d ago | |
| PKU-Safety (test) | DLMA-7B | Win Rate58 | 2 | 4d ago | |
| HH-Harmless (test) | DLMA-7B | Win Rate59 | 2 | 4d ago |