Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BigBench-Hard

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningBigBench Hard Boolean Expressions
Accuracy76.8
17
ReasoningBigBench Hard Penguins
Accuracy44.1
5
Linguistic ReasoningBigBench Hard Disambiguation QA
Accuracy55.1
5
ReasoningBigBench-Hard collection averaged
Ours Accuracy45.41
4
Showing 4 of 4 rows