Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SAT

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningSAT-Math
SAT Math Accuracy97.66
47
Spatial Mental ModelingSAT-Real
AVG93.9
41
Image ClassificationSAT-6 (test)
Accuracy99.84
21
Mathematical ReasoningSAT
Accuracy98.2
18
ReasoningSAT
Accuracy (SAT)97.6
17
Spatial AptitudeSAT
Accuracy92
17
Data Contamination DetectionSAT
F1 Score79
16
Image ClassificationSAT6
Accuracy96.75
16
Spatial ReasoningSAT Real
Accuracy (Pass@1)68.67
15
Off-policy evaluation for classification errorsat
Bias-0.007
15
Spatial Mental ModelingSAT (synthesized)
EgoM95.4
15
Analogy recognitionSAT
Accuracy60.78
15
Visual Question AnsweringSAT Real
Accuracy84.1
13
Spatial ReasoningSAT
Val Metric Score87.7
12
Visual UnderstandingSAT
Accuracy73.3
11
Spatial ReasoningSAT
Overall Acc80
11
Spatial ReasoningSAT ood (test)
Accuracy79.7
11
Analogy GenerationSAT (test)
Accuracy91
11
Analogy GenerationSAT
Accuracy0.91
11
Spatial UnderstandingSAT
Score88
10
3D/4D Video Question AnsweringSAT
Accuracy64.8
8
Spatial ReasoningSAT iid (val)
Accuracy92.7
8
Spatial ReasoningSAT (test)
Accuracy75.33
7
Spatial ReasoningSAT (val)
Accuracy93.48
7
3D TaskSAT
Accuracy75.33
7
Showing 25 of 48 rows