Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BBEH

Benchmarks

Task NameDataset NameSOTA ResultTrend
Logical ReasoningBBEH
Accuracy58.9
21
General ReasoningBBEH
Accuracy78.8
19
ReasoningBBEH (test)
Accuracy34.5
14
LLM RoutingBBEH (val)
Top-1 Acc66.4
14
LLM RoutingBBEH
Top-1 Accuracy66.4
14
ReasoningBBEH
pass@115.31
11
Adding MistakeBBEH
AOC67.2
7
Truncated CoT AnsweringBBEH
AOC0.665
7
Showing 8 of 8 rows