| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Sequence Classification | MASSIVE | Micro F180.36 | 64 | |
| Short-text Clustering | Massive (test) | NMI78.88 | 20 | |
| Text Classification | MASSIVE (test) | Accuracy78.6 | 18 | |
| Intent Classification | MASSIVE (test) | In-Scope Accuracy89.47 | 17 | |
| Slot Filling | MASSIVE Slotfill | F157.3 | 14 | |
| Classification | MassiveIntentClassification | Accuracy77.08 | 11 | |
| Intent Classification | MASSIVE (unsupervised) | Accuracy79.15 | 9 | |
| Selective Prediction | MASSIVE (test) | Guaranteed Test Coverage (alpha=0.10)100 | 8 | |
| Intent Classification | MASSIVE-Intent (test) | CFT Score80.73 | 8 | |
| Slot Filling | MASSIVE-Slot (test) | CFT62.54 | 8 | |
| Intent Classification | MASSIVE Intent | Accuracy80.7 | 8 | |
| Intent Classification | MASSIVE | In-Scope Accuracy66 | 8 | |
| Clustering | Massive-D | Accuracy57.6 | 7 | |
| Clustering | Massive I | Accuracy60.5 | 7 | |
| Intent Classification | MASSIVE W5H2 | Cost/1K0 | 7 | |
| Intent Clustering | Massive (I) | NMI0.7812 | 6 | |
| Text Classification | Massive | Label Quality68 | 5 | |
| Out-of-Distribution Intent Detection | MASSIVE | F1-Macro87.6 | 5 | |
| Intent Classification | MASSIVE (full) | F1-Macro87.6 | 5 | |
| Intent Classification | MASSIVE W5H2 (test) | Accuracy97.3 | 4 | |
| Out-of-Distribution Detection | Massive (test) | AUROC0.9679 | 4 | |
| Uncertainty Calibration | MASSIVE (test) | ECE0.059 | 4 | |
| Calibration | MASSIVE | ECE (Wrong Samples)0.586 | 4 | |
| Intent Clustering | MASSIVE (test) | ARI0.3 | 4 | |
| Short-text Clustering | Massive | NMI- | 0 |