| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-bench | GTO | SR5.23 | 43 | 1mo ago | |
| MT-Bench | GRPO (RM) | Conversation Rating (1-10)8.7 | 41 | 1mo ago | |
| MT-bench | Eagle3+LTD | Speedup4.64 | 25 | 1mo ago | |
| MT-Bench | CoDIT-Qwen3-8B | Average Score85.25 | 18 | 2d ago | |
| MT-Bench | TALON | MAT Score7.29 | 14 | 1mo ago | |
| ConvBench 1.0 (test) | GPT-4V | R1 (Pairwise)39.51 | 13 | 1mo ago | |
| Long-MT-Bench+ | Rhea | Accuracy7.36 | 10 | 1mo ago | |
| MT-Eval | Rhea | Accuracy8.28 | 9 | 1mo ago | |
| MT-Bench | Accuracy8.54 | 7 | 1mo ago | ||
| Multi-turn (Mt.) | ICPO | Mt. Score55.4 | 6 | 1mo ago | |
| Convbench | FLB | Win Rate (1st Turn)15.9 | 3 | 16d ago | |
| MT-Bench-101 | TCA-Attention | Grounding Score8.5 | 3 | 1mo ago | |
| MT-Bench | C2PO | Accuracy82.7 | 2 | 1mo ago |