Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Arithmetic Reasoning on Game of 24 (test)
Loading...
90
Success Rate
ICRL Preset
2.64
25.32
48
70.68
May 21, 2025
Success Rate
Updated 23d ago
Evaluation Results
Method
Method
Links
Success Rate
ICRL Preset
Base Model=GPT-4.1, Ev...
2025.05
90
ICRL Autonomous
Base Model=GPT-4.1, Ev...
2025.05
84
Best-of-N
Base Model=GPT-4.1, Ev...
2025.05
49
Long-CoT
Base Model=GPT-4.1, Ev...
2025.05
47
Self-Refine
Base Model=GPT-4.1, Ev...
2025.05
47
Reflexion
Base Model=GPT-4.1, Ev...
2025.05
44
CoT-only
Base Model=GPT-4.1, Ev...
2025.05
6
Feedback
Search any
task
Search any
task