Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Game of 24

Benchmarks

Task NameDataset NameSOTA ResultTrend
Mathematical ReasoningGame of 24
Accuracy758
147
Mathematical ReasoningGame of 24 (test)
Accuracy98
35
Logical ReasoningGame-of-24
Accuracy94.33
31
Arithmetic ReasoningGame of 24 (test)
Success Rate90
28
Arithmetic reasoning (multi-solution)Game of 24 4nu 137 (test)
Multi Solution Accuracy76.25
12
Explorative ReasoningGame of 24 (test)
Accuracy80
11
Arithmetic ReasoningGame of 24
Performance85.3
11
Arithmetic ReasoningGame of 24 95 (test)
Success Rate100
9
Game of 24Game of 24 100 tasks GPT-4
Success Rate74
8
Mathematical ReasoningGame of 24
pass@10.84
6
Arithmetic PlanningGame of 24
Accuracy86
4
ReasoningGame of 24
Inference Time (s)60.93
4
Mathematical ReasoningGame of 24
pass@1100
4
Showing 13 of 13 rows