Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 25 (Acc avg@32)
Loading...
92.1
Accuracy avg@32
IOP-GSPO
72.132
77.316
82.5
87.684
Apr 19, 2026
Accuracy avg@32
Updated 26d ago
Evaluation Results
Method
Method
Links
Accuracy avg@32
IOP-GSPO
Model Architecture=Qwe...
2026.04
92.1
GSPO
Model Architecture=Qwe...
2026.04
88.4
Base
Model Architecture=Qwe...
2026.04
87.8
IOP-GSPO
Model Architecture=Qwe...
2026.04
83.5
GSPO
Model Architecture=Qwe...
2026.04
76.6
Base
Model Architecture=Qwe...
2026.04
72.9
Feedback
Search any
task
Search any
task