| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-Bench | Qwen3-Omni-30B-A3B-Thinking | MT-Bench Score76.19 | 126 | 15d ago | |
| MT-Bench | DOUBLE | Speedup4.1 | 80 | 2mo ago | |
| MT-bench | CORAL | Kendall's Tau5.25 | 54 | 3mo ago | |
| MT-Bench | EAGLE-3 | Speedup3.22 | 44 | 5d ago | |
| MT-Bench | UAPO | GPT-4 Score8.9 | 34 | 8d ago | |
| MTBench101 | Score9.03 | 33 | 3mo ago | ||
| MT-Bench | Graft | MAT Score6.93 | 30 | 14d ago | |
| ShareGPT, JDDC, and MedDG Aggregated | SRavg89.77 | 24 | 1mo ago | ||
| MedDG | Success Rate (SR)86.77 | 24 | 1mo ago | ||
| JDDC | Success Rate (SR)88.42 | 24 | 1mo ago | ||
| ShareGPT | Success Rate (SR)94.11 | 24 | 1mo ago | ||
| Spec-Bench Multi. | SpecBound | CR3.22 | 21 | 1mo ago | |
| MT-Bench | PPOW | Acceptance Length (τ)5.78 | 20 | 19d ago | |
| MT-Bench | EVICT | Tokens/s270.96 | 20 | 1mo ago | |
| TopDial | LLM-EVAL7.71 | 20 | 1mo ago | ||
| ConsistentChat | MDS | LLM-EVAL Score8.52 | 20 | 1mo ago | |
| MT-Eval | MDS | LLM-EVAL Score8.16 | 20 | 1mo ago | |
| TSEData | ChatAD-Mistral-7B | Accuracy96.46 | 13 | 3mo ago | |
| MT-Bench | BASTION | Speedup4.55 | 12 | 5d ago | |
| SpokenWoz | Gemini2.5-Flash | Joint Goal Accuracy (JGA)52.09 | 11 | 22d ago | |
| MT-Bench (MTB) | Speedup Factor2.53 | 8 | 3mo ago | ||
| NPC-Chat (test) | AT-GRPO | Fluency3.84 | 8 | 3mo ago | |
| ACEBench En | MT Accuracy68 | 7 | 3mo ago | ||
| Honor-Dialogue | DVPO | Life Services Domain Performance88.13 | 6 | 3mo ago | |
| ShareGPT 3 Turn 6491 tokens | AdmTree | PPL2.79 | 6 | 3mo ago |