Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Image ClassificationSAT
Accuracy0.9858
56
Mathematical ReasoningSAT-Math
SAT Math Accuracy97.66
47
Spatial Mental ModelingSAT-Real
AVG93.9
41
Spatial UnderstandingSAT
Score88
23
Spatial ReasoningSAT Real
Accuracy (Pass@1)72.67
21
Image ClassificationSAT-6 (test)
Accuracy99.84
21
Mathematical ReasoningSAT
Accuracy98.2
18
ReasoningSAT
Accuracy (SAT)97.6
17
Spatial AptitudeSAT
Accuracy92
17
Data Contamination DetectionSAT
F1 Score79
16
Image ClassificationSAT6
Accuracy96.75
16
Off-policy evaluation for classification errorsat
Bias-0.007
15
Spatial Mental ModelingSAT (synthesized)
EgoM95.4
15
Analogy recognitionSAT
Accuracy60.78
15
Visual Question AnsweringSAT Real
Accuracy84.1
13
Spatial ReasoningSAT
Val Metric Score87.7
12
Visual UnderstandingSAT
Accuracy73.3
11
Spatial ReasoningSAT
Overall Acc80
11
Spatial ReasoningSAT ood (test)
Accuracy79.7
11
Analogy GenerationSAT (test)
Accuracy91
11
Analogy GenerationSAT
Accuracy0.91
11
SAT SolvingSAT n=50 planted (alpha=4.0)
Solve Percentage100
8
3D/4D Video Question AnsweringSAT
Accuracy64.8
8
Spatial ReasoningSAT iid (val)
Accuracy92.7
8
STEM ReasoningSAT
Accuracy0.893
7
Showing 25 of 55 rows