Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BIG-bench Hard

Benchmarks

Task NameDataset NameSOTA ResultTrend
General ReasoningBig-Bench Hard (BBH) (val)
Accuracy43.46
36
Word SortingBig-Bench Hard Word Sorting
Success Rate79.8
4
CountingBig-Bench Hard Counting
Success Rate91.9
4
Temporal ReasoningBIG-bench Hard Temporal Sequences (test)
Test Accuracy62
4
Showing 4 of 4 rows