Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BIG-Bench

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningBIG-Bench Hard
Accuracy91.1
68
General ReasoningBIG-bench
Accuracy @ t174.6
29
ReasoningBIG-Bench Hard (BBH) (test)
Average Accuracy79.4
28
Symbolic and Logical ReasoningBig-Bench Hard (BBH)
Exact Match Performance88.1
22
ReasoningBig-bench Hard (BBH)
Exact Match (EM)53.53
20
Multitask Language UnderstandingBIG-bench-lite 24 tasks
Score3,777
17
GenerationBig-Bench Hard (test)
Exact Match57.9
17
Various NLP tasks (NLU and Reasoning)BIG-bench (unseen)
Known Unknowns Score86.96
15
Date UnderstandingBIG-bench Hard Date Understanding (test)
Test Accuracy75.2
14
General Language UnderstandingBIG-bench Mimicked
Sports Score99.7
11
General Language UnderstandingBIG-bench Original
Sports Score99.4
11
ReasoningBIG-Bench Extra Hard
Score37.8
10
Multi-task Language UnderstandingBIG-bench
Hindu Knowledge80
10
Complex ReasoningBIG-bench Hard
Orig Score39.3
7
Algorithmic ReasoningBig-Bench Hard Word Sorting and Multi-step Arithmetic (test)
WS Accuracy80
7
Multiple Choice Question AnsweringBIG-bench HHH Eval
Overall Score87
7
Spoken DialogueBig Bench Audio (test)
S2T Accuracy72.9
6
LLM Workflow OptimizationBig-Bench Hard (BBH) (test)
BBH Overall Accuracy78.6
6
Task-solvingBIG-Bench Hard (BBH) (test)
Boolean Expressions84
6
Natural Language UnderstandingBIG-Bench Hard (BBH)
Accuracy42.1
5
Diverse reasoning tasksBIG-bench Hard (BBH)
Boolean Expressions83.2
5
ReasoningBIG-Bench Hard (train)
Causal Judgment67.7
5
Multi-task Language UnderstandingBIG-bench
Anachronisms49.1
5
General Language CapabilityBIG-bench 57 Task
Accuracy (Weighted)48.7
5
Movie RecommendationBIG-bench Hard Movie Recommendation (test)
Test Accuracy79
4
Showing 25 of 30 rows