| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Response Selection | DSTC7 Track 1 (test) | Recall@1 (Top 100)91.1 | 27 | |
| Dialogue Evaluation | DSTC9 Interactive Dialogue Evaluation Track (test) | Human Rating5 | 12 | |
| Response Generation | DSTC7 Shared Task (test) | NIST-42.669 | 8 | |
| Dialog | DSTC2 (test) | Average Error Rate48.9 | 7 | |
| Theme Detection | DSTC Travel domain 12 (test) | Semantic Relevance (SR)89.7 | 6 | |
| Conversation-level Topic Discovery and Labeling | DSTC-12 Travel domain (blind test) | Accuracy0.68 | 6 | |
| Dialogue State Tracking | DSTC2 | Joint GA85 | 5 | |
| Dialog act prediction | DSTC 4 (test) | Accuracy66.2 | 4 | |
| Multi-intent Natural Language Understanding | DSTC4 | Slot F161.1 | 3 | |
| Task-oriented dialogue | DSTC9 shared task Human Evaluation (test) | Avg Success Rate74.8 | 3 |