Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Classification Datasets

Benchmarks

Task NameDataset NameSOTA ResultTrend
Concept Extraction Evaluation4 classification datasets average
RAcc99.8
35
Tabular Classification53 classification datasets (unseen)
Mean Accuracy75.42
18
Zero-shot ClassificationClassification Datasets (MMLU, OBQA, ARC-e, WinoGrande, ARC-c, PIQA, HellaSwag)
MMLU (5-shot)37.1
18
Classification80 classification datasets
Median Effect Size (F1 pts)0.11
17
Open-Vocabulary Classification11 classification datasets (test)
ImageNet Accuracy76.77
16
Classificationmedium-sized classification datasets
Accuracy78.58
14
Selective PredictionClassification Datasets Average (test)
NAURC72.5
12
ClassificationClassification Datasets
Accuracy100
10
Classification25 Classification Datasets
Mean Accuracy89.1
10
Classification7 classification datasets (Iris, Wine, Breast Cancer, Digits, etc.) (cross-validation)
Accuracy91.17
10
Tabular Classification50 classification datasets
Mean Accuracy84.36
10
Classification6 out-of-domain classification datasets (test)
Accuracy65.2
9
Tabular Data GenerationClassification Datasets
Avg. JSD0.05
2
Classification15 Classification Datasets
TabMixNN Wins1,067
1
Showing 14 of 14 rows