Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Math Reasoning on AIME 2022–2024
Loading...
9.27
Accuracy
GRPO + WeMask (TF)
5.786
6.6905
7.595
8.4995
May 8, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
GRPO + WeMask (TF)
Setting=Training-free...
2026.05
9.27
GRPO + WeMask (TA)
Setting=Training-aware...
2026.05
7.8
GRPO
2026.05
7.4
Qwen3-4B
2026.05
5.92
Feedback
Search any
task
Search any
task