Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on Game of 24 (pass@k)
Loading...
100
pass@1
TSLM
11.6
34.55
57.5
80.45
Jan 30, 2026
pass@1
pass@100
Updated 1mo ago
Evaluation Results
Method
Method
Links
pass@1
pass@100
TSLM
evaluation_protocol=pa...
2026.01
100
-
Procedure Cloning
evaluation_protocol=pa...
2026.01
47
-
SC
evaluation_protocol=pa...
2026.01
17
-
GRPO
evaluation_protocol=pa...
2026.01
15
-
Tree-of-Thought
evaluation_protocol=pa...
2026.01
-
32
Feedback
Search any
task
Search any
task