Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Banking77

Benchmarks

Task NameDataset NameSOTA ResultTrend
Intent ClassificationBanking77 (test)
Accuracy93.83
184
Intent ClassificationBanking77
Accuracy94.94
70
Text ClassificationBANKING77 Dir(0.01) (test)
Accuracy85.98
45
Intent ClassificationBANKING77 5-shot (test)
Accuracy79.09
20
Intent ClassificationBANKING77 10-shot
Accuracy87.95
20
Text ClassificationBANKING77 Dir(0.1) (test)
Accuracy92.14
17
Text ClassificationBANKING77 Dir(0.5) (test)
Accuracy93.35
17
Intent ClassificationBANKING77 10-shot (test)
Accuracy82.66
12
Out-of-scope Intent DetectionBanking77 (B77) r=2.5% contamination (test)
AUC2 (10% Subset)0.818
12
Intent ClassificationBANKING77 5-shot
Accuracy78.9
11
Out-of-Domain DetectionBANKING77 75% known ratio
Accuracy84.4
8
Out-of-Domain DetectionBANKING77 50% known ratio
Accuracy83.78
8
Out-of-Domain DetectionBANKING77 25% known ratio
Accuracy85.71
8
Selective PredictionBanking77 ncal=6,468, delta=0.10, simulated confidence scores (test)
Accuracy (alpha=0.15)90.6
7
Topic ModelingBanking77
Purity0.705
7
ClusteringBanking77 ClusteringS2S
Accuracy0.3
6
Out-of-Scope Query DetectionBanking77-OOS (test)
AUC^2 (10%)83.3
6
Intent ClassificationBanking77 1.0 (train)
Accuracy93.8
6
Intent ClassificationBanking77 30 samples 1.0 (train)
Accuracy0.9057
6
Intent ClassificationBanking77 10 samples 1.0 (train)
Accuracy85.19
6
Out-of-Distribution Intent DetectionBanking77
F1-Macro91.2
5
Intent ClassificationBanking77
Top-1 Accuracy87.3
4
Uncertainty CalibrationBanking77 (test)
ECE0.039
4
CalibrationBanking77
ECE0.277
4
Showing 24 of 24 rows