| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| EmpatheticDialogues | MDD-C | Spearman Correlation0.404 | 19 | 4d ago | |
| DSTC9 Interactive Dialogue Evaluation Track (test) | Human Baseline | Human Rating5 | 12 | 4d ago | |
| Topical-Chat turn-level | UniEval (Dial) | Naturalness (Pearson r)0.444 | 11 | 4d ago | |
| Ice-breaker human evaluation 1.0 (test) | Model A | Overall Score0.552 | 10 | 4d ago | |
| Twitter-Eval | Spearman Correlation0.301 | 10 | 4d ago | ||
| Movie Eval | DEB | Spearman Correlation0.649 | 10 | 4d ago | |
| Topical-Eval | MDD-S | Spearman Correlation0.52 | 10 | 4d ago | |
| Persona-Eval | MDD-S | Spearman Correlation0.621 | 10 | 4d ago | |
| DailyDialog (eval) | MDD-S | Spearman Correlation0.579 | 10 | 4d ago | |
| ConvAI2 | QuantiDCE | Pearson Correlation0.554 | 9 | 4d ago | |
| TopicalChat | CV (Understandability)0.48 | 7 | 4d ago | ||
| USR-PersonaChat (test) | USL-H | Pearson Correlation (r)0.495 | 7 | 4d ago | |
| USR-TopicalChat (test) | RADE | Pearson Correlation (r)0.48 | 7 | 4d ago | |
| Chatbot Domain | Correlation Score0.93 | 6 | 4d ago | ||
| Human/Model Chats (test) | MMB Style | Engagement Score83 | 6 | 3d ago | |
| Empathetic Dialogues | USR RET0.996 | 5 | 4d ago | ||
| Wizard of Internet (WoI) | OPT-175B | Perplexity12 | 4 | 4d ago | |
| Blended Skill Talk (BST) | R2C2 BlenderBot | Perplexity11.7 | 4 | 4d ago | |
| Empathetic Dialogues (ED) | BlenderBot 1 | Perplexity9 | 4 | 4d ago | |
| Wizard of Wikipedia (WW) | R2C2 BlenderBot | Perplexity12.4 | 4 | 4d ago | |
| ConvAI2 (C2) | BlenderBot 1 | Perplexity10.2 | 4 | 4d ago | |
| PersonaChat | USR RET97.7 | 4 | 4d ago | ||
| DailyDialog | USR RET0.998 | 4 | 4d ago | ||
| ACUTE-Eval Human-Chat (test) | BlenderBot | Engagingness75 | 4 | 4d ago | |
| Soccer dialogue | DialoGPT | SentBLEU0.04 | 3 | 4d ago |