| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MT-bench Second Turn 1.0 | GPT-4 (Pairwise) | Agreement Rate95 | 46 | 4d ago | |
| MT-bench Second Turn | G4-Pair | Agreement95 | 32 | 4d ago | |
| Chatbot Arena Random = 50% (S2) | GPT-3.5 (Pairwise) | Agreement96 | 10 | 4d ago | |
| Chatbot Arena Random = 33% (S1) | GPT-4 (Pairwise) | Agreement Rate72 | 10 | 4d ago |