| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| AlpacaEval 2.0 (test) | AlpacaEval (LC win %)57.46 | 46 | 4d ago | ||
| MT-Bench | MT-Bench Score8.1 | 30 | 4d ago | ||
| AlpacaEval | Win Rate3,213 | 25 | 4d ago | ||
| MT-Bench 1.0 (test) | Llama-3.1-Instruct | MT-Bench Score8 | 19 | 4d ago | |
| IFEval | Loose Prompt Metric48.8 | 15 | 4d ago | ||
| AlpacaEval LC 2 | Qwen 3 VL 32B Instruct | LC Win Rate84.3 | 10 | 4d ago | |
| OMGEval (test) | English Score2,900 | 9 | 4d ago | ||
| Chat | Chat Score49.3 | 8 | 4d ago | ||
| MT-Bench TH | Typhoon-S-8B | Overall Score7.89 | 2 | 4d ago | |
| MT-Bench EN | Qwen3-8B | Score8.69 | 2 | 4d ago |