Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BIG-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningBIG-Bench Hard
Accuracy91.1
68
ReasoningBIG-Bench Hard (BBH) (test)
Average Accuracy87.3
56
General ReasoningBIG-bench
Accuracy (General)81.6
36
ReasoningBig-Bench Hard (BBH)
Accuracy60.39
33
ReasoningBIG-Bench Hard (train)
Accuracy91.9
28
Word SortingBIG-bench Hard Word Sorting (test)
Test Accuracy45
26
Symbolic and Logical ReasoningBig-Bench Hard (BBH)
Exact Match Performance88.1
22
ReasoningBig-bench Hard (BBH)
Exact Match (EM)53.53
20
Multitask Language UnderstandingBIG-bench-lite 24 tasks
Score3,777
17
GenerationBig-Bench Hard (test)
Exact Match57.9
17
Various NLP tasks (NLU and Reasoning)BIG-bench (unseen)
Known Unknowns Score86.96
15
Date UnderstandingBIG-bench Hard Date Understanding (test)
Test Accuracy75.2
14
ReasoningBIG-Bench Hard MIX-14K
Accuracy69.9
12
General Language UnderstandingBIG-bench Mimicked
Sports Score99.7
11
General Language UnderstandingBIG-bench Original
Sports Score99.4
11
ReasoningBIG Bench Audio Speech Modality
Accuracy0.9341
10
ReasoningBIG-Bench Extra Hard
Score37.8
10
Multi-task Language UnderstandingBIG-bench
Hindu Knowledge80
10
Task-solvingBIG-Bench Hard (BBH) (test)
Causal Judgement68.3
10
Complex Multi-step ReasoningBig-Bench Hard
Hard Accuracy85.7
9
Language ReasoningBBH (BIG-Bench Hard)
Object Counting Score99.4
8
Complex ReasoningBIG-bench Hard
Orig Score39.3
7
Algorithmic ReasoningBig-Bench Hard Word Sorting and Multi-step Arithmetic (test)
WS Accuracy80
7
Multiple Choice Question AnsweringBIG-bench HHH Eval
Overall Score87
7
Causal judgmentBig-Bench Hard
Accuracy69.5
6
Showing 25 of 36 rows