| MultiWOZ (test) | T5-Base | BLEU Score35.1 | | 27 | 4d ago |
| ReDial | KERL | Distinct-31.43 | | 17 | 4d ago |
| HiCUPID | DeepSeek-R1-671B | Accuracy63.9 | | 16 | 4d ago |
| Vicuna 80 prompts (test) | GPT-4 | Elo1,348 | | 16 | 4d ago |
| DailyDialog (test) | Hier | BLEU-235.4 | | 16 | 4d ago |
| HH dataset | Alpaca | Reward-0.96 | | 13 | 3d ago |
| Reddit multi-reference 6K (test) | DialoFlow | NIST-23.9 | | 9 | 4d ago |
| EMPATHETICDIALOGUES (test) | CASE | PPL35.37 | | 8 | 4d ago |
| DSTC7 Shared Task (test) | UNILM | NIST-42.669 | | 8 | 2d ago |
| Open Assistant 953 prompts (test) | GPT-4 | Elo Rating1,294 | | 7 | 4d ago |
| ReDial v1 (test) | CR-Walker | BLEU28 | | 6 | 4d ago |
| Empathetic Dialogue | MCTS-Driven Knowledge Retrieval | Pairwise Diversity75 | | 6 | 4d ago |
| DailyDialog | MCTS-Driven Knowledge Retrieval | Pairwise Diversity78.5 | | 6 | 4d ago |
| ChattyChef (test) | | BLEU5.4 | | 6 | 4d ago |
| CamRest676 | LABES-S2S | Match Acc96.4 | | 6 | 4d ago |
| ReDial (Human Evaluation) | TREA | Relevance Score2.43 | | 5 | 3d ago |
| MI Counseling | KEMI | Perplexity (PPL)13.84 | | 5 | 4d ago |
| DuClarifyDial | PLATO-MT | BLEU-150 | | 5 | 4d ago |
| In-Car | LABES-S2S | Match Score0.866 | | 5 | 4d ago |
| GoRecDial 1.0 (test) | CR-Walker | BLEU29.6 | | 4 | 4d ago |
| University Admission Inquiries (test) | Fine-Tuned with RAG | Fact Recall92.7 | | 4 | 4d ago |
| PhotoChat | EasyGen Vicuna | BLEU-123.6 | | 4 | 4d ago |
| Ubuntu IRC | HeterMPC_BERT | BLEU-112.61 | | 4 | 4d ago |
| AVSD | LTMI | CIDEr85.1 | | 4 | 4d ago |
| HuggingGPT Human Evaluation Set 130 diverse requests (test) | HuggingGPT | Success Rate63.08 | | 3 | 3d ago |