| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-Bench | CoDIT-Qwen3-8B | Average Score85.25 | 107 | 1d ago | |
| MT-bench | Eagle3+LTD | Speedup4.64 | 76 | 14d ago | |
| MT-bench | GTO | SR5.23 | 43 | 3mo ago | |
| MT-Bench | GRPO (RM) | Conversation Rating (1-10)8.7 | 41 | 2mo ago | |
| MT-Bench | RPO | Win Rate77.1 | 36 | 1d ago | |
| MT-Bench | TALON | MAT Score7.29 | 14 | 3mo ago | |
| ConvBench 1.0 (test) | GPT-4V | R1 (Pairwise)39.51 | 13 | 3mo ago | |
| Long-MT-Bench+ | Rhea | Accuracy7.36 | 10 | 3mo ago | |
| MT-Eval | Rhea | Accuracy8.28 | 9 | 3mo ago | |
| MT-Bench | Accuracy8.54 | 7 | 3mo ago | ||
| Multi-turn (Mt.) | ICPO | Mt. Score55.4 | 6 | 3mo ago | |
| MT-Bench | EAGLE3 (Qwen3.5 9B) | SGLang Acceptance Length3.69 | 4 | 22d ago | |
| Convbench | FLB | Win Rate (1st Turn)15.9 | 3 | 2mo ago | |
| MT-Bench-101 | TCA-Attention | Grounding Score8.5 | 3 | 3mo ago | |
| MT-Bench | C2PO | Accuracy82.7 | 2 | 3mo ago |