| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Dialogue Alignment Evaluation | AlignBench | Reasoning6.76 | 90 | |
| Instruction Following | AlignBench | Reasoning Score7.42 | 60 | |
| Pointwise Grading | AlignBench | Pearson (r)0.997 | 38 | |
| General LLM Evaluation | AlignBench | Reasoning Score7.27 | 20 | |
| Pairwise Comparison | AlignBench | Agreement74.69 | 18 | |
| Subjective Alignment | AlignBench | Subjective Score (0-10)6.8 | 10 | |
| Open-ended QA Response Ranking | AlignBench Minos | K Score47.68 | 9 | |
| Alignment | AlignBench v1 (test) | Score7.21 | 5 | |
| Alignment | AlignBench | Score8.27 | 4 |