| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Reward Modeling | PKU-SafeRLHF (test) | MAE0.074 | 36 | |
| Safety Alignment | PKU-SafeRLHF 30K (IID) | WR89.26 | 36 | |
| Safety Alignment Evaluation | PKU-SafeRLHF 30K (test) | Win Rate (WR)90.23 | 32 | |
| Human Preference Alignment | PKU-SafeRLHF | BLEU0.324 | 31 | |
| Safety Evaluation | PKU-SafeRLHF-V | Accuracy77.8 | 20 | |
| Helpful and Harmless Response Generation | PKU-SafeRLHF alpaca2-7b (test) | Helpfulness8.53 | 14 | |
| Safety alignment | PKU-SafeRLHF | Gold Reward3.92 | 14 | |
| LLM Safety Alignment | PKU-SafeRLHF full deduplicated (test) | Helpfulness8.22 | 12 | |
| LLM Alignment | PKU-SafeRLHF | BWR (Median)49 | 12 | |
| Best-of-N Alignment | PKU-SafeRLHF | Percent batches with BWR > 0.5038 | 12 | |
| Combined win rate evaluation | PKU-SafeRLHF prompts n = 100 samples (Sev-Low) | CVaR(0.125) Combined Win Rate53.6 | 10 | |
| Combined win rate evaluation | PKU-SafeRLHF Sev-3 prompts n = 100 samples | Combined Win Rate (CVaR 0.125)60.9 | 10 | |
| Combined win rate evaluation | PKU-SafeRLHF Conflict prompts n = 100 samples | CVaR(0.125) Combined Win Rate40.3 | 10 | |
| Combined win rate evaluation | PKU-SafeRLHF Random prompts n = 100 samples | CVaR(0.125) Combined Win Rate37.1 | 10 | |
| Safety Alignment Robustness Evaluation | PKU-SafeRLHF (n=100 samples) | Random Rate18.5 | 10 | |
| Safety Alignment | PKU-SafeRLHF in-distribution (test) | Accuracy (EN)99.44 | 10 | |
| Harmfulness Evaluation | PKU-SafeRLHF | Beaver-7B-Cost Score-1.11 | 10 | |
| Privacy Violation Detection | PKU-SafeRLHF | Acc87.5 | 9 | |
| Reward Model Transfer | PKU-SafeRLHF | AOG2.46 | 8 | |
| Preference Evaluation | PKU-SafeRLHF | Win Rate57 | 8 | |
| Safe RLHF Alignment | PKU-SafeRLHF 30K | Helpfulness6.51 | 7 | |
| Safety and Informativeness Evaluation | PKU-SafeRLHF (test) | Drugs & Weapons Safety Score85.3 | 6 | |
| Helpfulness | PKU-SafeRLHF 30K | Win Rate84.5 | 6 | |
| Harmlessness | PKU-SafeRLHF-30K | Win Rate87.25 | 6 | |
| Safety Detection | PKU-SafeRLHF (held-out) | AUROC90.6 | 5 |