Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Open Question Answering on AIME 2025 (test)
Loading...
70
Accuracy
GRPO
24.9368
36.6359
48.335
60.0341
Feb 10, 2026
Accuracy
Result Length
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Result Length
GRPO
2026.02
70
9,871
ESTAR
optimization=RL
2026.02
70
3,788
O1-Pruner
2026.02
66.67
7,841
FlashThink
2026.02
66.67
6,504
AdaptThink
2026.02
66.67
4,513
Length-Penalty
explicit_length_penalt...
2026.02
66.67
7,324
ESTAR-LITE
early_stopping=classif...
2026.02
66.67
3,045
ESTAR-FT
mode=fine-tuned
2026.02
66.67
3,413
No-Thinking
reasoning=disabled
2026.02
26.67
1,513
Feedback
Search any
task
Search any
task