Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math Word Problem Solving on GSM8K (test)
Loading...
89.3
Accuracy
MO-CAPO
42.084
54.342
66.6
78.858
May 15, 2026
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
MO-CAPO
Model=GPT-OSS-120B, To...
2026.05
89.3
CAPO
Model=GPT-OSS-120B, To...
2026.05
87.1
GEPA
Model=GPT-OSS-120B, To...
2026.05
85.7
NSGA-II-PO
Model=GPT-OSS-120B, To...
2026.05
85.2
CAPO
Model=Mistral-3.2-24B,...
2026.05
75.2
MO-CAPO
Model=Mistral-3.2-24B,...
2026.05
75.1
NSGA-II-PO
Model=Mistral-3.2-24B,...
2026.05
74.3
CAPO
Model=Qwen3-30B, Token...
2026.05
66.5
MO-CAPO
Model=Qwen3-30B, Token...
2026.05
66.2
GEPA
Model=Qwen3-30B, Token...
2026.05
65.9
NSGA-II-PO
Model=Qwen3-30B, Token...
2026.05
65.7
EvoPromptGA
Model=GPT-OSS-120B, To...
2026.05
58.4
GEPA
Model=Mistral-3.2-24B,...
2026.05
57.1
Initial
Model=Qwen3-30B, Token...
2026.05
54
EvoPromptGA
Model=Qwen3-30B, Token...
2026.05
53.7
Initial
Model=GPT-OSS-120B, To...
2026.05
49.5
Initial
Model=Mistral-3.2-24B,...
2026.05
46.3
EvoPromptGA
Model=Mistral-3.2-24B,...
2026.05
43.9
Feedback
Search any
task
Search any
task