| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Preference Classification | Anthropic HH Harmless (test) | Accuracy71.7 | 22 | |
| Dialogue Generation | Anthropic-HH (test) | Average Preference Score69.07 | 16 | |
| Dialogue | Anthropic-HH (distillation set) | Response Word Count73.53 | 16 | |
| Single-turn dialogue | Anthropic HH | Win Rate69.18 | 12 | |
| Preference Classification | Anthropic HH Helpful (test) | Accuracy57.6 | 7 | |
| Reward Modeling | Anthropic HH (test) | Accuracy68.49 | 5 | |
| Sycophancy Bias Detection | Anthropic-HH | AUC0.711 | 5 | |
| Length Bias Detection | Anthropic-HH | AUC80 | 5 | |
| Instruction Tuning | Anthropic HH (test) | Win Rate56.3 | 2 | |
| Instruction Tuning | Anthropic HH-RLHF (test) | Metric- | 0 |