| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Response Selection | ConvAI2 (dev) | R@1/2090.3 | 25 | |
| Dialogue Generation | CONVAI2 | BLEU3.68 | 24 | |
| Response Selection | ConvAI2 (test) | R@2087.9 | 16 | |
| Persona-based Dialogue | ConvAI2 (test) | Hits@190.68 | 10 | |
| Open domain dialogue | ConvAI2 | RSR38.6 | 9 | |
| Dialogue Retrieval | ConvAI2 | R@190.7 | 9 | |
| Personalized Dialogue Generation | ConvAI2 (Human Evaluation) | Readability80 | 8 | |
| Open-domain Conversation | NO-ConvAI2 NLEBench (test) | BLEU4.28 | 7 | |
| Personalized Dialogue Generation | ConvAI2 | BLEU-111.85 | 7 | |
| Red Teaming | ConvAI2 (filtered hard positive) | RSR2,120 | 7 | |
| Open-domain dialogue red teaming | ConvAI2 filtered (test) | RSR16.9 | 7 | |
| Persona-based Dialogue Generation | ConvAI2 | Coherence2.27 | 5 | |
| Dialogue Evaluation | ConvAI2 (C2) | Perplexity10.2 | 4 | |
| Dialogue Generation | ConvAI2 (val) | F1 Score21.7 | 4 | |
| Attribute-Controlled Dialogue Generation | ConvAI2-CG (test) | Persona Consistency2.17 | 3 | |
| Red Teaming | ConvAI2 (test) | P Score186 | 3 | |
| Dialogue Response Generation | ConvAI2 (val) | F1 Score20.72 | 3 | |
| Dialogue Generation | ConvAI2 (test) | Persona Consistency1.89 | 2 |