Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OBQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringOBQA
Accuracy94.95
300
Commonsense ReasoningOBQA
Accuracy89.2
117
Multiple Choice Question AnsweringOBQA
Accuracy93.2
69
Out-of-Distribution DetectionOBQA to MMLU
AUROC87.09
41
Question AnsweringOBQA
Zero-shot Accuracy35.2
36
ReasoningOBQA
Accuracy31.6
26
Uncertainty EstimationOBQA
AUROC88.03
24
Commonsense ReasoningOBQA
First-Token Accuracy91.4
24
Zero-shot PredictionOBQA
Accuracy31.4
17
Multiple Choice Question AnsweringOBQA (dev)
Accuracy86.1
17
Speech-to-Text Question-AnsweringOBQA
Accuracy83.08
16
Commonsense Question AnsweringOBQA
Accuracy93.4
14
Question AnsweringOBQA
Accuracy88.8
14
Question AnsweringOBQA (test)
Accuracy60.2
13
Common-sense reasoningOBQA In-Distribution
Accuracy88.43
12
ReasoningOBQA (leave-one-out setup)
Average Accuracy87.7
12
Question AnsweringOBQA (out-of-domain)
Acc95.59
12
Question AnsweringOBQA
Accuracy Improvement2.01
12
OpenBook Question AnsweringOBQA
Accuracy0.855
11
Question AnsweringOBQA
Accuracy (Normalized)41.8
9
Question AnsweringOBQA in-distribution (test)
Accuracy81.6
9
ReasoningOBQA (val)
Accuracy39.6
9
Multiple-choice science question answeringOBQA In-Distribution 64
Accuracy82.73
9
Audio-conditioned reasoningOBQA
Accuracy77.74
8
Downstream TaskOBQA
Accuracy25.2
7
Showing 25 of 31 rows