| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Value Alignment | HH Balance-8 | Conformity Score4.317 | 17 | |
| Human Preference Alignment | HH (test) | Reward3.8764 | 14 | |
| Response Generation | HH dataset | Reward-0.96 | 13 | |
| Harmfulness Evaluation | HH Harmless | Beaver-7B Cost Score3.25 | 10 | |
| Preference Evaluation | HH-Helpful | Win Count52 | 8 | |
| Model Discovery | HH | Avg NLL (Model)25.18 | 6 | |
| LLM-as-Judge evaluation | HH dataset | WCWR59.1 | 5 | |
| Human Evaluation | HH dataset | Win Rate59 | 3 | |
| Pairwise Judge Comparison | HH helpful | Win/Loss Count149 | 1 |