| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-Bench | LLAMA2 | Overall Score62.7 | 331 | 4d ago | |
| MT-Bench-zh | TaP-SFT (GPT-4) | Score6.34 | 90 | 4d ago | |
| BotChat | ISM | Success Rate (N=16)86.3 | 9 | 4d ago | |
| MT-Eval | Expansion Score7.34 | 9 | 4d ago | ||
| MT-Bench | GPT-4 | MT-Bench Score8.99 | 5 | 4d ago | |
| MT-Bench Turn-2 | LoRA | Writing Score1.9 | 3 | 4d ago | |
| ruMT-Bench 1 (test) | T-pro 2.0 | Speedup1.79 | 2 | 4d ago |