Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME In-Distribution 2026 (Accuracy)
Loading...
70
Accuracy
SkillFlow
31.8632
41.7641
51.665
61.5659
May 13, 2026
Accuracy
Updated 19d ago
Evaluation Results
Method
Method
Links
Accuracy
SkillFlow
Method Category=Ours
2026.05
70
FlowSteer
Method Category=Agent+RL
2026.05
63.33
AgentFlow
Method Category=Agent+RL
2026.05
60
SkillRL
Method Category=Agent+RL
2026.05
56.67
Qwen3.5-AFlow
Method Category=AFlow,...
2026.05
53.33
v4-flash
Method Category=Baseline
2026.05
50
Qwen3.5-9B
Method Category=Baseline
2026.05
46.67
Qwen3.5-SFT
Method Category=SFT, B...
2026.05
36.67
Qwen3.5-GRPO
Method Category=GRPO,...
2026.05
33.33
Feedback
Search any
task
Search any
task