Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME MEDIUM reasoning with tools 2025
Loading...
91.7
Score
HarmonyAgent
90.348
90.699
91.05
91.401
Apr 1, 2026
Score
95% CI
Updated 17d ago
Evaluation Results
Method
Method
Links
Score
95% CI
HarmonyAgent
Evaluation Harness=Har...
2026.04
91.7
87.5
gpt-oss-20b
Evaluation Harness=Ope...
2026.04
90.4
-
Feedback
Search any
task
Search any
task