Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Computation on MathQA
Loading...
52.34
Exact Match (EM)
Prompt-R1
16.5952
25.8751
35.155
44.4349
Nov 2, 2025
Exact Match (EM)
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
F1 Score
Prompt-R1
2025.11
52.34
61.59
CoT Reasoning
Backbone=GPT-4o-mini
2025.11
49.22
57.03
GRPO
Backbone=Qwen3-4B
2025.11
46.88
54.43
Baseline
Backbone=GPT-4o-mini
2025.11
46.09
54.04
TextGrad
Category=APO (GPT-4o-m...
2025.11
44.53
61.46
OPRO
Category=APO (GPT-4o-m...
2025.11
43.75
60.08
GEPA
Category=APO (GPT-4o-m...
2025.11
40.63
61.59
Baseline
Backbone=Qwen3-4B
2025.11
28.91
32.29
CoT Reasoning
Backbone=Qwen3-4B
2025.11
27.34
30.6
SFT
Backbone=Qwen3-4B
2025.11
17.97
22.66
Feedback
Search any
task
Search any
task