| MiniF2F (val) | InternLM2-StepProver | Success Rate63.9 | | 59 | 4d ago |
| miniF2F Lean (test) | DeepSeekMath-Base | Pass@6452 | | 24 | 4d ago |
| LeanDojo (random) | LeanListener | Pass@153.21 | | 16 | 4d ago |
| set.mm (test) | HOLOPHRASM + MetaGen-IL | Proofs Found (Test)600 | | 14 | 4d ago |
| TheoremQA | InternLM2-20B | Accuracy13.5 | | 13 | 4d ago |
| ProofNet (test) | HAGBP | Pass Rate (%)15.25 | | 12 | 4d ago |
| LeanDojo (novel premises) | LeanListener | Pass@141.11 | | 12 | 4d ago |
| ProofNet (val) | DeepSeek-Prover-V1.5-RL + RMaxTS | Accuracy25.4 | | 11 | 4d ago |
| miniF2F Lean (val) | DeepSeekMath-Base | Cumulative Pass Rate60.2 | | 10 | 4d ago |
| Putnam 2025 (full) | SEED-PROVER 1.5 | Problem A1 Score631 | | 8 | 4d ago |
| Small-scale benchmark Overall | Gemini-3-Pro | VR33 | | 8 | 4d ago |
| DeepTheorem | DeepSeek-V3.2-Thinking (Agentic) | False Rate54 | | 8 | 4d ago |
| DeepMath | Gemini-3-Flash | FR (Fetch Rate)94 | | 8 | 4d ago |
| INT Proof length 15 | BF-kSubS | Success Rate91 | | 8 | 4d ago |
| INT Proof length 10 | BF-kSubS | Success Rate99 | | 8 | 4d ago |
| INT (Proof length 5) | BF-kSubS | Success Rate99 | | 8 | 4d ago |
| ProofNet (all) | DeepSeek-Prover-V1.5-RL + RMaxTS | Accuracy25.3 | | 7 | 4d ago |
| PISA 2021-10-22 (test) | Thor | Success Rate57 | | 5 | 4d ago |
| miniF2F Lean (curriculum) | Evariste | Pass@6432.1 | | 3 | 4d ago |
| LeanDojo Benchmark 4 Lean 3 (random) | ReProver | Pass@148.6 | | 2 | 4d ago |
| Metamath (test) | Evariste | Pass@865.6 | | 2 | 4d ago |
| iset.mm (test) | HOLOPHRASM + MetaGen-IL | Proofs Found398 | | 2 | 4d ago |