Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BBH

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningBBH
Accuracy95.4
726
Logical ReasoningBBH
Accuracy100
249
General ReasoningBBH
Accuracy93.2
190
General ReasoningBBH
BBH General Reasoning Accuracy94.6
103
ReasoningBBH (test)
Accuracy73.9
94
Complex ReasoningBBH
Accuracy90.5
85
Instruction InductionBBH Induct
Accuracy91.3
80
ReasoningBBH 3-shot
BBH 3-shot Score65.69
49
ReasoningBBH
BBH Pass@183.69
49
Complex ReasoningBBH (val)
Accuracy65.81
42
Causal ReasoningBBH Causal Judgement
Accuracy (BBH Causal Judgement)78
40
Instruction FollowingBBH
Accuracy67.1
40
ReasoningBBH
BBH Score84.5
39
ReasoningBBH
Score81.1
36
Spatial ReasoningBBH Navigate
Accuracy@198
33
Question AnsweringBBH
Accuracy94.6
33
Logical ReasoningBBH (test)
Top@1 Accuracy88.29
29
Deductive ReasoningBBH Ded.
Accuracy92.5
28
Common-sense ReasoningBBH
Accuracy58.27
27
Instruction TuningBBH
Accuracy (BBH)66.2
24
Logical DeductionBBH Logical Deduction (Seven Objects) (test)
Accuracy55.2
22
Common Sense ReasoningBBH Sports Understanding
Accuracy (BBH Sports)88
21
Symbolic and Logical ReasoningBBH
Accuracy85.01
20
Benchmark Compression (Coreset selection)BBH (full)
rho0.913
20
tracking shuffled objects seven objectsBBH (test)
Accuracy92.8
20
Showing 25 of 88 rows