| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Instruction Following | WildBench (test) | Info Seek58.6 | 27 | |
| General Instruction Following | WildBench | Score92.6 | 19 | |
| General chat | WildBench 2025 (test) | WB-Elo1,062.4 | 12 | |
| Subjective Evaluation | WildBench | Score0.8604 | 5 | |
| Open-ended text generation | WildBench | Score-1.7 | 4 | |
| General Language Model Evaluation | WildBench | WildBench Score26.95 | 2 |