Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BBH

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningBBH
Accuracy95.4
672
Logical ReasoningBBH
Accuracy100
201
General ReasoningBBH
BBH General Reasoning Accuracy94.6
98
ReasoningBBH (test)
Accuracy62.06
67
ReasoningBBH 3-shot
BBH 3-shot Score65.69
49
Complex ReasoningBBH (val)
Accuracy65.81
42
Complex ReasoningBBH
Accuracy85.93
40
Instruction FollowingBBH
Accuracy67.1
40
Question AnsweringBBH
Accuracy94.6
30
Logical ReasoningBBH (test)
Top@1 Accuracy88.29
27
Common-sense ReasoningBBH
Accuracy58.27
27
Symbolic and Logical ReasoningBBH
Accuracy85.01
20
Benchmark Compression (Coreset selection)BBH (full)
rho0.913
20
General ReasoningBBH
Accuracy82.9
18
Reasoning and ClassificationBBH (Big-Bench Hard) (unseen)
BBH Temporal Sequences98.8
17
Complex ReasoningBBH
Acc83.03
16
ReasoningBBH
BBH Pass@169.92
16
General ReasoningBBH
Relative Cost1
14
Big-Bench Hard ReasoningBBH
Accuracy69.16
14
General ReasoningBBH
Accuracy (BBH)73.2
12
Hard Reasoning TasksBBH
BBH Accuracy (0-shot)52.1
12
ReasoningBBH (unseen)
Total Average Score42.38
12
General ReasoningBBH
Score88.7
12
Navigation ReasoningBBH-Navigate (test)
Accuracy98
11
Reasoningbbh-zh
Overall Score87.52
10
Showing 25 of 54 rows