| MultiWOZ 2.0 (test) | DarwinTOD | Inform Rate99.1 | | 37 | 4d ago |
| Stanford Multi-Domain Dialogue (SMD) (test) | COMET | BLEU17.3 | | 29 | 4d ago |
| MultiWOZ 2.2 (test) | DarwinTOD | Inform Rate96.48 | | 23 | 4d ago |
| CamRest676 end-to-end modeling (test) | ARDM | Task Success Rate87.1 | | 18 | 4d ago |
| MultiWOZ 2.4 (test) | CoALM 70B | JGA43.8 | | 15 | 3d ago |
| FewShotSGD unseen schemata (test) | Full Data Baseline | BLEU28.76 | | 13 | 4d ago |
| FewShotSGD seen schemata (test) | Full Data Baseline | BLEU29.28 | | 13 | 4d ago |
| FewShotWeather unseen structures (test) | Full Data Baseline | BLEU62.44 | | 13 | 4d ago |
| FewShotWeather seen structures (test) | Full Data Baseline | BLEU74.43 | | 13 | 4d ago |
| MultiWOZ 2.0 | LAVA | Inform Rate91.8 | | 13 | 4d ago |
| MultiWOZ 2.1 (test) | DarwinTOD | Inform Rate99.62 | | 11 | 4d ago |
| MultiWOZ 20% 2.0 (train) | MDTOD | Inform90.25 | | 10 | 4d ago |
| MultiWOZ 2.0 (10% train) | MDTOD | Inform Rate86.3 | | 10 | 4d ago |
| MultiWOZ 5% 2.0 (train) | MDTOD | Inform85.65 | | 10 | 4d ago |
| MultiWOZ Taxi domain 1.0 | EWC+RL | Combined Score106.3 | | 10 | 4d ago |
| MultiWOZ 1.0 (train) | EWC+RL | Combined Score104.9 | | 10 | 4d ago |
| MultiWOZ Attraction domain 1.0 | EWC+RL | Combined Score96 | | 10 | 4d ago |
| MultiWOZ Hotel domain 1.0 | EWC+RL | Combined Score100.7 | | 10 | 4d ago |
| MultiWOZ Restaurant domain 1.0 | Naive+RL | Combined Score97.6 | | 10 | 4d ago |
| StarV2 (test) | Genie + GPT-4o-mini | Bank Score82.5 | | 10 | 4d ago |
| SMCalFlow (test) | Noisy channel online decoding | SacreBLEU64.29 | | 9 | 4d ago |
| SGD 1.0 (test) | DarwinTOD | Inform Rate81.29 | | 6 | 4d ago |
| Real-world Dialogue Benchmark (combined) | | Life Services TC72.5 | | 6 | 4d ago |
| MultiWOZ (Human Evaluation) | SOLOIST | Success Rate91.67 | | 5 | 4d ago |
| KVRET | T5-3B | Micro F167.88 | | 4 | 4d ago |