Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theorem Proving on PutnamBench Lean
Loading...
668
Solved Rate
Aleph Prover
-25.68
154.41
334.5
514.59
Feb 27, 2026
Solved Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Solved Rate
Aleph Prover
Limit=$1400, Compute B...
2026.02
668
Aleph Prover
Limit=$400, Compute Bu...
2026.02
637
Seed-Prover 1.5 (ByteDance)
Compute Budget=10 H20...
2026.02
581
Aleph Prover
Limit=$100, Compute Bu...
2026.02
500
Hilbert
Compute Budget=avg pas...
2026.02
462
Seed-Prover (ByteDance)
Compute Budget=MEDIUM
2026.02
329
Ax-Prover
Compute Budget=pass@1,...
2026.02
91
Goedel-Prover-V2
Compute Budget=pass@184
2026.02
86
DeepSeek-Prover-V2
Compute Budget=pass@1024
2026.02
47
GPT-5
Reasoning=ReAct, Turns...
2026.02
28
DSP+
Compute Budget=pass@128
2026.02
23
Bourbaki
Compute Budget=pass@512
2026.02
14
Kimina-Prover-7B-Distill
Compute Budget=pass@192
2026.02
10
Self-play Theorem Prover
Compute Budget=pass@3200
2026.02
8
ABEL
Compute Budget=pass@596
2026.02
7
Goedel-Prover-SFT
Compute Budget=pass@512
2026.02
7
InternLM2.5-StepProver
Compute Budget=pass@2x...
2026.02
6
InternLM 7B
Compute Budget=pass@4096
2026.02
4
gemini-2.5-pro-exp-0325
Compute Budget=pass@1
2026.02
3
gemini-2.0-flash-thinking-121
Compute Budget=pass@1
2026.02
1
Deepseek R1
Compute Budget=pass@1
2026.02
1
COPRA
Base Model=GPT-4o, Com...
2026.02
1
GPT-4o
Compute Budget=pass@10
2026.02
1
Feedback
Search any
task
Search any
task