| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Helpfulness alignment | Anthropic hh-rlhf | Gold Reward3.36 | 14 | |
| Preference Alignment | Anthropic-hh-rlhf (test) | LLM-as-a-Judge Helpful Score5.83 | 12 | |
| Reward Modeling | Anthropic/hh-rlhf HH-helpful core250 | Delta RM0.292 | 6 | |
| Response Diversity | Anthropic HH-RLHF | Preference Coverage82.5 | 6 | |
| LLM Alignment | Anthropic HH-RLHF 2022 (test) | Win Rate62 | 4 | |
| Preference Learning | Anthropic HH-RLHF+VI Preference (test) | Overall Accuracy64 | 3 |