Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MASSIVE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sequence ClassificationMASSIVE
Micro F180.36
64
Short-text ClusteringMassive (test)
NMI78.88
20
Text ClassificationMASSIVE (test)
Accuracy78.6
18
Intent ClassificationMASSIVE (test)
In-Scope Accuracy89.47
17
Slot FillingMASSIVE Slotfill
F157.3
14
ClassificationMassiveIntentClassification
Accuracy77.08
11
Intent ClassificationMASSIVE (unsupervised)
Accuracy79.15
9
Selective PredictionMASSIVE (test)
Guaranteed Test Coverage (alpha=0.10)100
8
Intent ClassificationMASSIVE-Intent (test)
CFT Score80.73
8
Slot FillingMASSIVE-Slot (test)
CFT62.54
8
Intent ClassificationMASSIVE Intent
Accuracy80.7
8
Intent ClassificationMASSIVE
In-Scope Accuracy66
8
ClusteringMassive-D
Accuracy57.6
7
ClusteringMassive I
Accuracy60.5
7
Intent ClassificationMASSIVE W5H2
Cost/1K0
7
Intent ClusteringMassive (I)
NMI0.7812
6
Text ClassificationMassive
Label Quality68
5
Out-of-Distribution Intent DetectionMASSIVE
F1-Macro87.6
5
Intent ClassificationMASSIVE (full)
F1-Macro87.6
5
Intent ClassificationMASSIVE W5H2 (test)
Accuracy97.3
4
Out-of-Distribution DetectionMassive (test)
AUROC0.9679
4
Uncertainty CalibrationMASSIVE (test)
ECE0.059
4
CalibrationMASSIVE
ECE (Wrong Samples)0.586
4
Intent ClusteringMASSIVE (test)
ARI0.3
4
Short-text ClusteringMassive
NMI-
0
Showing 25 of 25 rows