Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Proof writing on IMO-ProofBench
Loading...
58.7
Avg@3 Grade Score
Gemini 3 Pro
18.868
29.209
39.55
49.891
Apr 6, 2026
Avg@3 Grade Score
Updated 12d ago
Evaluation Results
Method
Method
Links
Avg@3 Grade Score
Gemini 3 Pro
Training/Evaluation Pr...
2026.04
58.7
DeepSeek-Math-V2
Model Size=685B
2026.04
57.9
QED-Nano (+ RSA test-time scaffold)
Model Size=4B, Trainin...
2026.04
56.9
GPT-OSS-120B
Model Size=120B
2026.04
43.1
Nomos-1
2026.04
40.3
QED-Nano
Model Size=4B
2026.04
40
QED-Nano (SFT initialization only)
Model Size=4B, Trainin...
2026.04
39.5
GPT-OSS-20B
Model Size=20B
2026.04
38.3
Qwen3-235B-A22B-Thinking-2507
Model Size=235B-A22B
2026.04
34.1
Qwen3-30B-A3B-Thinking-2507
Model Size=30B-A3B
2026.04
27.6
Qwen3-4B-Thinking-2507
Model Size=4B
2026.04
20.4
Feedback
Search any
task
Search any
task