| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-Bench | MT-Bench Score8.91 | 73 | 5d ago | ||
| AlpacaEval 2.0 (test) | AlpacaEval (LC win %)57.46 | 58 | 15d ago | ||
| AlpacaEval | Win Rate3,213 | 39 | 3mo ago | ||
| MT-Bench 1.0 (test) | Llama-3.1-Instruct | MT-Bench Score8 | 19 | 3mo ago | |
| Alpaca | D-PACE | Success Rate (SR)1.79 | 16 | 14d ago | |
| AlpacaEval LC 2 | Qwen 3 VL 32B Instruct | LC Win Rate84.3 | 16 | 16d ago | |
| IFEval | Loose Prompt Metric48.8 | 15 | 3mo ago | ||
| Alpaca | BASTION | Speedup3.59 | 12 | 5d ago | |
| OMGEval (test) | English Score2,900 | 9 | 3mo ago | ||
| AlpacaEval Length Controlled (test) | G-Zero | AlpLC Score27.86 | 8 | 21d ago | |
| Chat | Chat Score49.3 | 8 | 3mo ago | ||
| MT-Bench TH | Typhoon-S-8B | Overall Score7.89 | 2 | 3mo ago | |
| MT-Bench EN | Qwen3-8B | Score8.69 | 2 | 3mo ago |