Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving
About
LLMs have demonstrated strong mathematical reasoning abilities by leveraging reinforcement learning with long chain-of-thought, yet they continue to struggle with theorem proving due to the lack of clear supervision signals when solely using natural language. Dedicated domain-specific languages like Lean provide clear supervision via formal verification of proofs, enabling effective training through reinforcement learning. In this work, we propose \textbf{Seed-Prover}, a lemma-style whole-proof reasoning model. Seed-Prover can iteratively refine its proof based on Lean feedback, proved lemmas, and self-summarization. To solve IMO-level contest problems, we design three test-time inference strategies that enable both deep and broad reasoning. Seed-Prover proves $78.1\%$ of formalized past IMO problems, saturates MiniF2F, and achieves over 50\% on PutnamBench, outperforming the previous state-of-the-art by a large margin. To address the lack of geometry support in Lean, we introduce a geometry reasoning engine \textbf{Seed-Geometry}, which outperforms previous formal geometry engines. We use these two systems to participate in IMO 2025 and fully prove 5 out of 6 problems. This work represents a significant advancement in automated mathematical reasoning, demonstrating the effectiveness of formal verification with long chain-of-thought reasoning.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Automated Theorem Proving | MiniF2F (test) | Success Rate99.6 | 93 | |
| Euclidean geometry problem solving | IMO-30 | Solved Problems30 | 19 | |
| Formal Theorem Proving | PutnamBench | Solve Rate50.4 | 5 | |
| Formal Theorem Proving | Fate-H | Solve Rate35 | 2 | |
| Formal Theorem Proving | Fate-X | Solve Rate9 | 2 | |
| Formal Theorem Proving | Combibench | Solve Rate39 | 2 |