| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Dialog State Tracking | SGD 15 tasks CL | Avg JGA76.3 | 23 | |
| Natural Language Generation | SGD (test) | BLEU28.6 | 18 | |
| User Satisfaction Estimation | SGD | Accuracy64.8 | 14 | |
| Task-oriented Dialogue | FewShotSGD unseen schemata (test) | BLEU28.76 | 13 | |
| Task-oriented Dialogue | FewShotSGD seen schemata (test) | BLEU29.28 | 13 | |
| Dialogue State Tracking | SGD (test) | JGA86.5 | 11 | |
| Dialog Structure Induction | SGD (test) | Purity46.8 | 9 | |
| Dialogue State Tracking | SGD | JGA (Payment)24.7 | 8 | |
| User Satisfaction Estimation | SGD 5% training size (test) | Precision75.3 | 8 | |
| Task-Oriented Dialogue | SGD 1.0 (test) | Inform Rate81.29 | 6 | |
| Structure Induction | SGD Real (test) | AMI0.559 | 6 | |
| Hidden Representation Learning | SGD Real (test) | Class-Balanced Acc (Full)66.3 | 6 | |
| Dialogue State Tracking | SGD-X v1-v5 variants (test) | Joint Goal Acc (Original)86.4 | 6 | |
| Goal completion | SGD (test) | Inform Rate50.4 | 5 | |
| Dialogue State Tracking | SGD to MultiWoz (test) | Average JGA51.2 | 5 | |
| Dialogue State Tracking | SGD All Domains (test) | Joint GA32.1 | 4 | |
| Dialogue State Tracking | SGD Unseen Domains (test) | Joint GA24.4 | 4 | |
| Natural Language Generation | SGD (Overall) | Naturalness2.46 | 4 | |
| Natural Language Generation | SGD Seen domains | Naturalness2.48 | 4 | |
| Natural Language Generation | SGD Unseen domains | Naturalness2.46 | 4 | |
| Natural Language Generation | SGD 1.0 (overall) | BLEU28.6 | 4 | |
| Natural Language Generation | SGD 1.0 (unseen domains) | BLEU Score22.2 | 4 | |
| Natural Language Generation | SGD seen domains 1.0 | BLEU29.4 | 4 | |
| Dialog Structure Induction | SGD Synthetic (test) | Purity81 | 3 | |
| Action Selection Task (AST) | SGD (out-of-distribution) | B-Slot Acc61.3 | 3 |