| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| MiniF2F (test) | Seed Prover | Pass@199.6 | 128 | 12d ago | |
| PutnamBench | Solved Count668 | 42 | 1mo ago | ||
| miniF2F Isabelle (val) | LEGO-Prover* | Success Rate57 | 41 | 3mo ago | |
| miniF2F Isabelle (test) | Lyra | Success Rate51.2 | 39 | 3mo ago | |
| ProofNet | GOEDEL VALUE | Accuracy24.26 | 26 | 8d ago | |
| miniF2F rw (test) | Goedel-Prover-V2-8B | Pass@875 | 24 | 12d ago | |
| miniF2F rw (val) | Goedel-Prover-V2-8B | Pass@881.1 | 24 | 12d ago | |
| Combibench | Seed-Prover 1.5 | Solve Rate48 | 15 | 1mo ago | |
| miniF2F (val) | POETRY | Pass@142.2 | 15 | 3mo ago | |
| Lean (test) | α-DPG | Pass@173 | 14 | 3mo ago | |
| Number Theory | Hilbert | PutnamBench2.51 | 13 | 1mo ago | |
| Inequality | Hilbert | 567NEQ3.1 | 13 | 1mo ago | |
| miniF2F | Segment-level | Proof Success Rate66.31 | 12 | 21d ago | |
| miniF2F | Average Token Cost228.64 | 12 | 21d ago | ||
| Putnam 2025 | rocq-mcp | Proof Lines110 | 12 | 2mo ago | |
| ProofNet (test) | ProofSketcher | Pass@144.62 | 12 | 12d ago | |
| PutnamBench September 2025 | HILBERT | Solved Problems Count462 | 11 | 2mo ago | |
| Ineq-Comp (test) | DeepSeek-Prover-V2-7B | Ineq-Comp (Seed)66.7 | 9 | 12d ago | |
| mathlib (val) | θ_mathlib (expert iterated on mathlib-train) | Pass@162.6 | 9 | 3mo ago | |
| MathOlympiadBench (MoBench) | GOEDEL VALUE | Accuracy34.44 | 8 | 1mo ago | |
| Fate-H | Seed-Prover 1.5 | Solve Rate80 | 7 | 2mo ago | |
| FormalML Hard | Proof Length11.2 | 6 | 1mo ago | ||
| Synthetic 20 | Proof Length14.5 | 6 | 1mo ago | ||
| Library 10 | Proof Length10 | 6 | 1mo ago | ||
| UniGeo (10) | Proof Length4.1 | 6 | 1mo ago |