Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience

About

Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) remains challenging and computationally expensive, particularly when addressing problems at the undergraduate level and beyond. In this work, we present \textbf{Seed-Prover 1.5}, a formal theorem-proving model trained via large-scale agentic reinforcement learning, alongside an efficient test-time scaling (TTS) workflow. Through extensive interactions with Lean and other tools, the model continuously accumulates experience during the RL process, substantially enhancing the capability and efficiency of formal theorem proving. Furthermore, leveraging recent advancements in natural language proving, our TTS workflow efficiently bridges the gap between natural and formal languages. Compared to state-of-the-art methods, Seed-Prover 1.5 achieves superior performance with a smaller compute budget. It solves \textbf{88\% of PutnamBench} (undergraduate-level), \textbf{80\% of Fate-H} (graduate-level), and \textbf{33\% of Fate-X} (PhD-level) problems. Notably, using our system, we solved \textbf{11 out of 12 problems} from Putnam 2025 within 9 hours. Our findings suggest that scaling learning from experience, driven by high-quality formal feedback, holds immense potential for the future of formal mathematical reasoning.

Jiangjie Chen, Wenxiang Chen, Jiacheng Du, Jinyi Hu, Zhicheng Jiang, Allan Jie, Xiaoran Jin, Xing Jin, Chenggang Li, Wenlei Shi, Zhihong Wang, Mingxuan Wang, Chenrui Wei, Shufa Wei, Huajian Xin, Fan Yang, Weihao Gao, Zheng Yuan, Tianyang Zhan, Zeyu Zheng, Tianxi Zhou, Thomas Hanwen Zhu• 2025

Related benchmarks

TaskDatasetResultRank
Spatial ReasoningVSI-Bench--
255
Spatial ReasoningViewspatial
Accuracy43.8
125
Spatial ReasoningVSI-Bench 1.0 (test)
Average Score49.9
101
Spatial ReasoningCV-Bench
Accuracy85.2
79
Spatial ReasoningMMSI-Bench
Accuracy38.3
65
Formal Theorem ProvingPutnamBench
Solved Count581
42
Theorem ProvingPutnam-Bench
Pass@3287.9
29
Automated Formal Theorem ProvingPutnam 2025
Average Score1.16e+3
28
Theorem ProvingPutnamBench Lean
Solved Rate581
23
Formal Theorem ProvingCombibench
Solve Rate48
15
Showing 10 of 15 rows

Other info

Follow for update