| Chronicle and MSC Average | EventWeave | CEA70.3 | | 30 | 9d ago |
| MultiWOZ (test) | T5-Base | BLEU Score35.1 | | 27 | 1mo ago |
| ReDial | KERL | Distinct-31.43 | | 17 | 1mo ago |
| HiCUPID | DeepSeek-R1-671B | Accuracy63.9 | | 16 | 1mo ago |
| Ubuntu IRC | Qwen3-8B+DRCR | BLEU-117.81 | | 16 | 9d ago |
| Vicuna 80 prompts (test) | GPT-4 | Elo1,348 | | 16 | 1mo ago |
| DailyDialog (test) | Hier | BLEU-235.4 | | 16 | 1mo ago |
| HH dataset | Alpaca | Reward-0.96 | | 13 | 1mo ago |
| Reddit multi-reference 6K (test) | DialoFlow | NIST-23.9 | | 9 | 1mo ago |
| EMPATHETICDIALOGUES (test) | CASE | PPL35.37 | | 8 | 1mo ago |
| DSTC7 Shared Task (test) | UNILM | NIST-42.669 | | 8 | 1mo ago |
| Open Assistant 953 prompts (test) | GPT-4 | Elo Rating1,294 | | 7 | 1mo ago |
| BusinessAI | RAGen | ROUGE-L36.82 | | 6 | 4d ago |
| TradePolicy | RAGen | ROUGE-L39.11 | | 6 | 4d ago |
| PPFS | RAGen | ROUGE-L39.55 | | 6 | 4d ago |
| Dialogue dataset | | Coherence3.67 | | 6 | 9d ago |
| ReDial v1 (test) | CR-Walker | BLEU28 | | 6 | 1mo ago |
| Empathetic Dialogue | MCTS-Driven Knowledge Retrieval | Pairwise Diversity75 | | 6 | 1mo ago |
| DailyDialog | MCTS-Driven Knowledge Retrieval | Pairwise Diversity78.5 | | 6 | 1mo ago |
| ChattyChef (test) | | BLEU5.4 | | 6 | 1mo ago |
| CamRest676 | LABES-S2S | Match Acc96.4 | | 6 | 1mo ago |
| ReDial (Human Evaluation) | TREA | Relevance Score2.43 | | 5 | 1mo ago |
| MI Counseling | KEMI | Perplexity (PPL)13.84 | | 5 | 1mo ago |
| DuClarifyDial | PLATO-MT | BLEU-150 | | 5 | 1mo ago |
| In-Car | LABES-S2S | Match Score0.866 | | 5 | 1mo ago |