Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Theorem Proving on Prover-Bench
Loading...
70.8
Pass@32
LongCat-Flash-Prover
32.008
42.079
52.15
62.221
Mar 22, 2026
Pass@32
Updated 25d ago
Evaluation Results
Method
Method
Links
Pass@32
LongCat-Flash-Prover
Mode=sketch-proof, Sea...
2026.03
70.8
LongCat-Flash-Prover
Mode=sketch-proof, Sea...
2026.03
69.5
LongCat-Flash-Prover
Evaluation Mode=sketch...
2026.03
66.5
DeepSeek-Prover-V2-671B
Budget (b)=512
2026.03
59.1
LongCat-Flash-Prover
Evaluation Mode=whole-...
2026.03
57.9
Goedel-Prover-V2-32B
Model Category=Open-We...
2026.03
53.2
DeepSeek-Prover-V2-671B
Model Category=Open-We...
2026.03
52.9
LongCat-Flash-Prover
Evaluation Mode=whole-...
2026.03
49.9
Goedel-Prover-V2-8B
Model Category=Open-We...
2026.03
49.5
DeepSeek-Prover-V2-7B
Model Category=Open-We...
2026.03
49
Leanabell-Prover-V2-DS
Budget (b)=128
2026.03
48.7
Leanabell-Prover-V2-DS
Model Category=Open-We...
2026.03
47.8
Kimina-Prover-72B
Model Category=Open-We...
2026.03
44.6
Kimi-K2.5
Model Category=Open-We...
2026.03
44.3
Leanabell-Prover-V2-KM
Budget (b)=128
2026.03
42.9
DeepSeek-V3.2
Model Category=Open-We...
2026.03
42.8
Leanabell-Prover-V2-KM
Model Category=Open-We...
2026.03
39.8
Kimina-Prover-8B
Model Category=Open-We...
2026.03
37.8
Gemini-3 Pro
Model Category=Close-W...
2026.03
33.5
Feedback
Search any
task
Search any
task