Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on MathGAP (held-out)
Loading...
87.3
Accuracy
CORE
37.38
50.34
63.3
76.26
May 27, 2026
Accuracy
Updated 6d ago
Evaluation Results
Method
Method
Links
Accuracy
CORE
training samples=5
2026.05
87.3
GEPA
training samples=5
2026.05
85.3
CORE
training samples=100
2026.05
84.3
MemRL
training samples=100
2026.05
83.3
CORE
training samples=10
2026.05
83
GEPA
training samples=10
2026.05
79
GEPA
training samples=100
2026.05
77.7
Episodic RAG
training samples=5
2026.05
77
MemRL
training samples=5
2026.05
74.7
MemRL
training samples=10
2026.05
71.3
Episodic RAG
training samples=100
2026.05
71
Episodic RAG
training samples=10
2026.05
59
No Learning
training samples=5
2026.05
47.2
No Learning
training samples=10
2026.05
47.2
No Learning
training samples=100
2026.05
47.2
GRPO
training samples=100
2026.05
44.3
GRPO
training samples=10
2026.05
40
GRPO
training samples=5
2026.05
39.3
Feedback
Search any
task
Search any
task