| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| EmpatheticDialogues | MDD-C | Spearman Correlation0.404 | 19 | 1mo ago | |
| DSTC9 Interactive Dialogue Evaluation Track (test) | Human Baseline | Human Rating5 | 12 | 1mo ago | |
| Topical-Chat turn-level | UniEval (Dial) | Naturalness (Pearson r)0.444 | 11 | 1mo ago | |
| ConsistentChat (test) | MDS | G-E Score7.3 | 10 | 8d ago | |
| Banking (test) | MDS | G-E6.72 | 10 | 8d ago | |
| Ice-breaker human evaluation 1.0 (test) | Model A | Overall Score0.552 | 10 | 1mo ago | |
| Twitter-Eval | Spearman Correlation0.301 | 10 | 1mo ago | ||
| Movie Eval | DEB | Spearman Correlation0.649 | 10 | 1mo ago | |
| Topical-Eval | MDD-S | Spearman Correlation0.52 | 10 | 1mo ago | |
| Persona-Eval | MDD-S | Spearman Correlation0.621 | 10 | 1mo ago | |
| DailyDialog (eval) | MDD-S | Spearman Correlation0.579 | 10 | 1mo ago | |
| ConvAI2 | QuantiDCE | Pearson Correlation0.554 | 9 | 1mo ago | |
| TopicalChat | CV (Understandability)0.48 | 7 | 1mo ago | ||
| USR-PersonaChat (test) | USL-H | Pearson Correlation (r)0.495 | 7 | 1mo ago | |
| USR-TopicalChat (test) | RADE | Pearson Correlation (r)0.48 | 7 | 1mo ago | |
| Chatbot Domain | Correlation Score0.93 | 6 | 1mo ago | ||
| Human/Model Chats (test) | MMB Style | Engagement Score83 | 6 | 1mo ago | |
| Empathetic Dialogues | USR RET0.996 | 5 | 1mo ago | ||
| Wizard of Internet (WoI) | OPT-175B | Perplexity12 | 4 | 1mo ago | |
| Blended Skill Talk (BST) | R2C2 BlenderBot | Perplexity11.7 | 4 | 1mo ago | |
| Empathetic Dialogues (ED) | BlenderBot 1 | Perplexity9 | 4 | 1mo ago | |
| Wizard of Wikipedia (WW) | R2C2 BlenderBot | Perplexity12.4 | 4 | 1mo ago | |
| ConvAI2 (C2) | BlenderBot 1 | Perplexity10.2 | 4 | 1mo ago | |
| PersonaChat | USR RET97.7 | 4 | 1mo ago | ||
| DailyDialog | USR RET0.998 | 4 | 1mo ago |