Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OBQA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringOBQA
Accuracy94.95
347
Commonsense ReasoningOBQA
Accuracy89.2
187
Question AnsweringOBQA (test)
Accuracy71.6
80
Multiple Choice Question AnsweringOBQA
Accuracy93.2
79
ReasoningOBQA
Accuracy97.67
46
Out-of-Distribution DetectionOBQA to MMLU
AUROC87.09
41
Question AnsweringOBQA
Zero-shot Accuracy35.2
36
OpenBook Question AnsweringOBQA
Accuracy44.2
32
Question AnsweringOBQA
Accuracy (Normalized)41.8
29
Uncertainty EstimationOBQA
AUROC88.03
24
Commonsense ReasoningOBQA
First-Token Accuracy91.4
24
Multiple Choice Question AnsweringOBQA (test)
Accuracy71.6
21
Commonsense Question AnsweringOBQA
Accuracy93.4
19
Zero-shot PredictionOBQA
Accuracy31.4
17
Multiple Choice Question AnsweringOBQA (dev)
Accuracy86.1
17
Speech-to-Text Question-AnsweringOBQA
Accuracy83.08
16
Question AnsweringOBQA
Accuracy88.8
14
Common-sense reasoningOBQA In-Distribution
Accuracy88.43
12
ReasoningOBQA (leave-one-out setup)
Average Accuracy87.7
12
Question AnsweringOBQA
Accuracy90.1
12
Question AnsweringOBQA (out-of-domain)
Acc95.59
12
Question AnsweringOBQA
Accuracy Improvement2.01
12
Question AnsweringOBQA in-distribution (test)
Accuracy81.6
9
ReasoningOBQA (val)
Accuracy39.6
9
Multiple-choice science question answeringOBQA In-Distribution 64
Accuracy82.73
9
Showing 25 of 37 rows