Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Mathematical Reasoning on AIME 2026
Loading...
95
pass@1
Nemotron-Cascade-2 30B-A3B
-1.616
23.467
48.55
73.633
Mar 1, 2026
Mar 14, 2026
Mar 27, 2026
Apr 10, 2026
Apr 23, 2026
May 6, 2026
May 20, 2026
pass@1
Updated 13d ago
Evaluation Results
Method
Method
Links
pass@1
Nemotron-Cascade-2 30B-A3B
Tool-Integrated Reason...
2026.03
95
Qwen3.5 35B-A3B
Official/Recommended S...
2026.03
91.1
Nemotron-Cascade-2 30B-A3B
2026.03
90.9
Nemotron-3-Nano 30B-A3B
Official/Recommended S...
2026.03
89.9
Nemotron-3-Super 120B-A12B
Official/Recommended S...
2026.03
89.8
Qwen3-4B-Thinking-2507 + CHIMERA
# Params=4B, Scale Cat...
2026.03
82.7
Qwen3-4B-Thinking-2507
# Params=4B, Scale Cat...
2026.03
80.8
DeepSeek-R1-0528-Qwen3-8B
# Params=8B, Scale Cat...
2026.03
78
Qwen3-32B
# Params=32B, Scale Ca...
2026.03
74.3
DeepSeek-R1-Distill-Llama-70B
# Params=70B, Scale Ca...
2026.03
59.4
Qwen3-4B-Thinking-2507 + OpenScience
# Params=4B, Scale Cat...
2026.03
53
NFPO
Base Model=Qwen3-8B-Ba...
2026.05
28.6
DPPO
Base Model=Qwen3-8B-Ba...
2026.05
26.5
GRPO
Base Model=Qwen3-8B-Ba...
2026.05
23
NFPO
Base Model=Qwen3-1.7B-...
2026.05
8.1
DPPO
Base Model=Qwen3-1.7B-...
2026.05
5.6
Qwen3-8B-Base
Base Model=Qwen3-8B-Ba...
2026.05
5.6
GRPO
Base Model=Qwen3-1.7B-...
2026.05
4.7
Qwen3-1.7B-Base
Base Model=Qwen3-1.7B-...
2026.05
2.1
Feedback
Search any
task
Search any
task