| Chronicle and MSC Average | EventWeave | CEA70.3 | | 30 | 1mo ago |
| MultiWOZ (test) | T5-Base | BLEU Score35.1 | | 27 | 3mo ago |
| UltraFeedback (val) | RPO | BERTScore88.1 | | 24 | 1d ago |
| ReDial | KERL | Distinct-31.43 | | 17 | 3mo ago |
| HiCUPID | DeepSeek-R1-671B | Accuracy63.9 | | 16 | 3mo ago |
| Ubuntu IRC | Qwen3-8B+DRCR | BLEU-117.81 | | 16 | 1mo ago |
| Vicuna 80 prompts (test) | GPT-4 | Elo1,348 | | 16 | 3mo ago |
| DailyDialog (test) | Hier | BLEU-235.4 | | 16 | 3mo ago |
| SCREEN (test) | SiPeR | BLEU-149.5 | | 13 | 1mo ago |
| SIMMC 2.1 (test) | SiPeR | BLEU-133.77 | | 13 | 1mo ago |
| HH dataset | Alpaca | Reward-0.96 | | 13 | 3mo ago |
| ESConv (test) | FiSMiness | Fluency3.9 | | 10 | 23d ago |
| Reddit multi-reference 6K (test) | DialoFlow | NIST-23.9 | | 9 | 3mo ago |
| EMPATHETICDIALOGUES (test) | CASE | PPL35.37 | | 8 | 3mo ago |
| DSTC7 Shared Task (test) | UNILM | NIST-42.669 | | 8 | 3mo ago |
| DecTest resp_gen no_hds (1000 samples) | a_n | Spearman ρ0.924 | | 7 | 1d ago |
| CS Resp. (test) | | BS72.4 | | 7 | 1mo ago |
| Open Assistant 953 prompts (test) | GPT-4 | Elo Rating1,294 | | 7 | 3mo ago |
| BusinessAI | RAGen | ROUGE-L36.82 | | 6 | 1mo ago |
| TradePolicy | RAGen | ROUGE-L39.11 | | 6 | 1mo ago |
| PPFS | RAGen | ROUGE-L39.55 | | 6 | 1mo ago |
| Dialogue dataset | | Coherence3.67 | | 6 | 1mo ago |
| ReDial v1 (test) | CR-Walker | BLEU28 | | 6 | 3mo ago |
| Empathetic Dialogue | MCTS-Driven Knowledge Retrieval | Pairwise Diversity75 | | 6 | 3mo ago |
| DailyDialog | MCTS-Driven Knowledge Retrieval | Pairwise Diversity78.5 | | 6 | 3mo ago |