| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Safety Alignment | HH-RLHF | MD Rate1.09 | 36 | |
| Assistant Response Alignment (Helpfulness and Harmlessness) | HH-RLHF (test) | Helpfulness Win Rate89.42 | 31 | |
| Safety Evaluation | HH-RLHF (test) | Harm Score1.02 | 21 | |
| LLM Alignment | HH-RLHF (test) | Win Rate80.3 | 21 | |
| RLHF | HH-RLHF | Human Win Rate74 | 16 | |
| Reward model verification | HH-RLHF | Win Rate47.3 | 12 | |
| Harmlessness evaluation | HH-RLHF harmless (test) | Win Rate83.33 | 12 | |
| Certified Poisoning Stability | HH-RLHF | FTS@1100 | 9 | |
| Dialogue generation | full-hh-rlhf (test) | Win Rate (Beaver-7b-v3.0-reward)79.3 | 8 | |
| Helpfulness evaluation | HH-RLHF helpful (test) | Helpfulness Fraction77 | 7 | |
| Validity Certification | HH-RLHF (test) | FTV@k=1100 | 6 | |
| Constitutional AI Alignment | HH-RLHF (test) | Likert Score Ranking4.596 | 6 | |
| Controllable multi-objective generation | HH-RLHF Helpful vs Harmless (test) | Hypervolume1.24 | 6 | |
| Humor | HH-RLHF (test) | Reward2.481 | 4 | |
| Harmlessness | HH-RLHF (test) | Reward2.772 | 4 | |
| Helpfulness | HH-RLHF (test) | Reward2.542 | 4 | |
| Controllable multi-objective generation | HH-RLHF Helpful vs Humor (test) | Hypervolume1.24 | 4 | |
| Conversational Assistant | HH-RLHF | Reward0.5 | 3 |