Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on GSM8K (P@4, #Tok)
Loading...
94.5
Pass@4
Vanilla LLM
40.732
54.691
68.65
82.609
May 9, 2026
Pass@4
Avg. Tokens (#Tok)
Updated 22d ago
Evaluation Results
Method
Method
Links
Pass@4
Avg. Tokens (#Tok)
Vanilla LLM
2026.05
94.5
2,100
CRISP
2026.05
94.3
1,700
MPD
2026.05
94.2
1,400
Direct Comp.
2026.05
94.1
1,200
LiteCoT
2026.05
52
1,200
Chain-of-Draft
2026.05
42.8
600
Feedback
Search any
task
Search any
task