| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| LLM Ranking | LMArena-based simulation | Top-8 Ranked Candidates-2.5 | 9 | |
| User Preference | LMArena | Win Rate vs. Generative UI50 | 4 | |
| Expert Preference Pairwise | LMArena Expert Med | Kendall's tau_b0.476 | 3 | |
| General Preference Pairwise | LMArena Med | Kendall's Tau_b0.594 | 3 |