| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-Bench | DOUBLE | Speedup4.1 | 80 | 16d ago | |
| MT-bench | CORAL | Kendall's Tau5.25 | 54 | 1mo ago | |
| MTBench101 | Score9.03 | 33 | 1mo ago | ||
| ShareGPT, JDDC, and MedDG Aggregated | SRavg89.77 | 24 | 4d ago | ||
| MedDG | Success Rate (SR)86.77 | 24 | 4d ago | ||
| JDDC | Success Rate (SR)88.42 | 24 | 4d ago | ||
| ShareGPT | Success Rate (SR)94.11 | 24 | 4d ago | ||
| Spec-Bench Multi. | SpecBound | CR3.22 | 21 | 4d ago | |
| TopDial | LLM-EVAL7.71 | 20 | 9d ago | ||
| ConsistentChat | MDS | LLM-EVAL Score8.52 | 20 | 9d ago | |
| MT-Eval | MDS | LLM-EVAL Score8.16 | 20 | 9d ago | |
| TSEData | ChatAD-Mistral-7B | Accuracy96.46 | 13 | 1mo ago | |
| MT-Bench (MTB) | Speedup Factor2.53 | 8 | 1mo ago | ||
| NPC-Chat (test) | AT-GRPO | Fluency3.84 | 8 | 1mo ago | |
| ACEBench En | MT Accuracy68 | 7 | 1mo ago | ||
| Honor-Dialogue | DVPO | Life Services Domain Performance88.13 | 6 | 1mo ago | |
| ShareGPT 3 Turn 6491 tokens | AdmTree | PPL2.79 | 6 | 1mo ago | |
| ShareGPT 2 Turn, 3006 tokens | AdmTree | PPL2.91 | 6 | 1mo ago | |
| ShareGPT 1 Turn, 765 tokens | AdmTree | Perplexity4.01 | 6 | 1mo ago |