Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BBH

Benchmarks

Task NameDataset NameSOTA ResultTrend
ReasoningBBH
Accuracy95.4
507
Logical ReasoningBBH
Accuracy100
93
General ReasoningBBH
BBH General Reasoning Accuracy88.7
43
Complex ReasoningBBH (val)
Accuracy65.81
42
Complex ReasoningBBH
Accuracy85.93
40
Instruction FollowingBBH
Accuracy67.1
40
ReasoningBBH (test)
Accuracy59.5
40
Question AnsweringBBH
Accuracy94.6
30
Logical ReasoningBBH (test)
Top@1 Accuracy88.29
27
Common-sense ReasoningBBH
Accuracy58.27
27
Benchmark Compression (Coreset selection)BBH (full)
rho0.913
20
ReasoningBBH
BBH Pass@169.92
16
Hard Reasoning TasksBBH
BBH Accuracy (0-shot)52.1
12
ReasoningBBH (unseen)
Total Average Score42.38
12
Navigation ReasoningBBH-Navigate (test)
Accuracy98
11
Reasoningbbh-zh
Overall Score87.52
10
Helpfulness, Honesty, and Harmlessness Alignment EvaluationBBH HHH
Harmlessness Score95
10
Comprehensive cognitive reasoningBBH
BBH Comprehensive Reasoning Score40.65
8
Reasoning and ClassificationBBH (Big-Bench Hard) (unseen)
BBH Boolean Expressions88.4
8
General ReasoningBBH
Pass@155.59
8
Logical reasoningBBH multiple-choice (first 1,000 samples)
Exact Match Accuracy86.2
7
Logical DeductionBBH Logical Deduction (Seven Objects) (test)
Accuracy47.5
6
NavigationBBH Navigation (test)
Accuracy83.1
6
Complex reasoningBBH
BBH Solution Rate67.4
6
STEMBBH
Accuracy70.8
6
Showing 25 of 32 rows