Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Arithmetic Reasoning on Game of 24
Loading...
85.3
Performance
ReSCALE
9.068
28.859
48.65
68.441
Oct 22, 2025
Nov 16, 2025
Dec 11, 2025
Jan 5, 2026
Jan 30, 2026
Feb 24, 2026
Mar 22, 2026
Performance
Max Acc.
Updated 25d ago
Evaluation Results
Method
Method
Links
Performance
Max Acc.
ReSCALE
Budget=Large, Tokens=4...
2026.03
85.3
85.9
AlphaZero
Budget=Medium, Tokens=...
2026.03
84.3
86.2
ReSCALE
Budget=Medium, Tokens=...
2026.03
83.4
85.9
AlphaZero
Budget=Large, Tokens=4...
2026.03
82.9
84.8
AlphaZero
Budget=Small, Tokens=0...
2026.03
74.4
86.7
ReSCALE
Budget=Small, Tokens=0...
2026.03
71.6
81.2
LDDM-G
Architecture=LDDM-G (O...
2025.10
63
-
Best-of-N
Budget=N = 32, Tokens=...
2026.03
54.1
-
MGDM
Architecture=MGDM, Par...
2025.10
47
-
LDDM-G
Architecture=LDDM-G (O...
2025.10
28
-
MGDM
Architecture=MGDM, Par...
2025.10
12
-
Feedback
Search any
task
Search any
task