Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Computation on DAPO Math
Loading...
26.56
Exact Match (EM)
Prompt-R1
-1.0624
6.1088
13.28
20.4512
Nov 2, 2025
Exact Match (EM)
F1 Score (F1)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Exact Match (EM)
F1 Score (F1)
Prompt-R1
Backbone=GPT-4o-mini
2025.11
26.56
26.56
CoT Reasoning
Backbone=GPT-4o-mini
2025.11
20.31
20.32
Baseline
Backbone=GPT-4o-mini
2025.11
18.75
18.76
GEPA
Optimization Framework...
2025.11
13.28
14.06
TextGrad
Optimization Framework...
2025.11
10.16
10.27
OPRO
Optimization Framework...
2025.11
6.25
6.39
GRPO
Backbone=Qwen3-4B
2025.11
3.91
3.91
SFT
Backbone=Qwen3-4B
2025.11
3.13
3.13
Baseline
Backbone=Qwen3-4B
2025.11
0
0
CoT Reasoning
Backbone=Qwen3-4B
2025.11
0
0
Feedback
Search any
task
Search any
task