| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Human Preference Alignment | HH (test) | Reward3.8764 | 14 | |
| Response Generation | HH dataset | Reward-0.96 | 13 | |
| Harmfulness Evaluation | HH Harmless | Beaver-7B Cost Score3.25 | 10 | |
| Preference Evaluation | HH-Helpful | Win Count52 | 8 | |
| Model Discovery | HH | Avg NLL (Model)25.18 | 6 | |
| LLM-as-Judge evaluation | HH dataset | WCWR59.1 | 5 | |
| Human Evaluation | HH dataset | Win Rate59 | 3 |