Rethinking Supervision Granularity: Segment-Level Learning for LLM-Based Theorem Proving

About

Automated theorem proving with large language models in Lean 4 is commonly approached through either step-level tactic prediction with tree search or whole-proof generation. These two paradigms represent opposite granularities for constructing supervised training data: the former provides dense local signals but may fragment coherent proof processes, while the latter preserves global structure but requires complex end-to-end generation. In this paper, we revisit supervision granularity as a training set construction problem over proof trajectories and propose segment-level supervision, a training data construction strategy that extracts locally coherent proof segments for training policy models. We further reuse the same strategy at inference time to trigger short rollouts for existing step-level models. When trained with segment-level supervision on STP, LeanWorkbook, and NuminaMath-LEAN, the resulting policy models achieve proof success rates of 64.84%, 60.90%, and 66.31% on miniF2F, respectively, consistently outperforming both step-level and whole-proof baselines. Goal-aware rollout further improves existing step-level provers while reducing inference costs. It increases the proof success rate of BFS-Prover-V2-7B from 68.77% to 70.74% and that of InternLM2.5-StepProver from 59.59% to 60.33%, showing that appropriate supervision granularity better aligns model learning with proof structure and search. Code and models are available at https://github.com/NJUDeepEngine/SEG-ATP.

Shuo Xu, Jiakun Zhang, Junyu Lai, Chun Cao, Jingwei Xu• 2026

Related benchmarks

Task	Dataset	Result
Formal Theorem Proving	MiniF2F	Proof Success Rate66.31	12
Formal Theorem Proving	MiniF2F	Average Token Cost446.4	12
Formal Theorem Proving	STP In-domain	Average Token Cost2.60e+3	4
Formal Theorem Proving	NuminaMath LEAN (In-domain)	Average Token Cost1.71e+3	4
Formal Theorem Proving	LeanWorkbook (In-domain)	Average Token Cost602.1	4

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord