| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MiniF2F (test) | Kimina-Prover-Preview | Pass@180.74 | 100 | 4d ago | |
| miniF2F Isabelle (val) | LEGO-Prover* | Success Rate57 | 41 | 4d ago | |
| miniF2F Isabelle (test) | Lyra | Success Rate51.2 | 39 | 4d ago | |
| miniF2F (val) | POETRY | Pass@142.2 | 15 | 4d ago | |
| Lean (test) | α-DPG | Pass@173 | 14 | 4d ago | |
| mathlib (val) | θ_mathlib (expert iterated on mathlib-train) | Pass@162.6 | 9 | 4d ago | |
| PhysLeanData (test) | PhysProver | Classical Score58.8 | 6 | 4d ago | |
| ProofNet (val) | Hierarchical Attention | Pass Rate9.04 | 6 | 4d ago | |
| PutnamBench | Seed-Prover 1.5 | Solve Rate87.9 | 5 | 4d ago | |
| ProofNet (test) | DeepSeek-Prover-V1.5-SFT + RMaxTS | Accuracy25.8 | 5 | 4d ago | |
| mathlib (test) | θ_mathlib (expert iterated on mathlib-train) | Pass@163 | 3 | 4d ago | |
| Metamath set.mm (val) | GPT-f (160m) | Performance Score29.22 | 3 | 4d ago | |
| Combibench | Seed-Prover 1.5 | Solve Rate48 | 2 | 4d ago | |
| Fate-X | Seed-Prover 1.5 | Solve Rate33 | 2 | 4d ago | |
| Fate-H | Seed-Prover 1.5 | Solve Rate80 | 2 | 4d ago | |
| large-scale benchmark 2,000 problems (test) | TheoremForge | FR Rate0.813 | 2 | 4d ago | |
| Putnam 2025 | - | - | 0 | 4d ago |