Leanabell-Prover-V2: Verifier-integrated Reasoning for Formal Theorem Proving via Reinforcement Learning

About

We introduce our Leanabell-Prover-V2, a 7B large language models (LLMs) that can produce formal theorem proofs in Lean 4, with verifier-integrated Long Chain-of-Thoughts (CoT). Following our previous work Leanabell-Prover-V1, we continual to choose to posttrain existing strong prover models for further performance improvement. In our V2 version, we mainly upgrade the Reinforcement Learning (RL) with feedback provided by the Lean 4 verifier. Crucially, verifier feedback, such as indicating success or detailing specific errors, allows the LLM to become ``self-aware'' of the correctness of its own reasoning process and learn to reflexively correct errors. Leanabell-Prover-V2 directly optimizes LLM reasoning trajectories with multi-turn verifier interactions, together with feedback token masking for stable RL training and a simple reward strategy. Experiments show that Leanabell-Prover-V2 improves performance by 3.2% (pass@128) with Kimina-Prover-Preview-Distill-7B and 2.0% (pass@128) with DeepSeek-Prover-V2-7B on the MiniF2F test set. The source codes, curated data and models are available at: https://github.com/Leanabell-LM/Leanabell-Prover-V2.

Xingguang Ji, Yahui Liu, Qi Wang, Jingyuan Zhang, Yang Yue, Rui Shi, Chenxi Sun, Fuzheng Zhang, Guorui Zhou, Kun Gai• 2025

Related benchmarks

Task	Dataset	Result
Formal Theorem Proving	MiniF2F (test)	--	128
Automated Theorem Proving	MiniF2F (test)	Success Rate78.2	93
Theorem Proving	Prover-Bench	Pass@3248.7	19
Theorem Proving	ProofNet (test)	Pass@3223.7	15
Counterexample generation	VERI-REASON	Pass@1144	10
Counterexample generation	VERI-FORMALIZE	Pass@1111	10
Counterexample generation	FOR-COUNTER	Pass@1106	10
Formal Proof Generation	Lean 4 (val)	Pass@125.4	6

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord