| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-Bench | GRPO (RM) | Conversation Rating (1-10)8.7 | 41 | 4d ago | |
| MT-Bench | TALON | MAT Score7.29 | 14 | 3d ago | |
| ConvBench 1.0 (test) | GPT-4V | R1 (Pairwise)39.51 | 13 | 3d ago | |
| Long-MT-Bench+ | Rhea | Accuracy7.36 | 10 | 4d ago | |
| MT-Eval | Rhea | Accuracy8.28 | 9 | 4d ago | |
| MT-Bench | Accuracy8.54 | 7 | 4d ago | ||
| Multi-turn (Mt.) | ICPO | Mt. Score55.4 | 6 | 3d ago | |
| MT-Bench-101 | TCA-Attention | Grounding Score8.5 | 3 | 3d ago | |
| MT-Bench | C2PO | Accuracy82.7 | 2 | 4d ago |