A Minimal Agent for Automated Theorem Proving

About

We propose a minimal agentic baseline that enables systematic comparison across different AI-based theorem prover architectures. This design implements the core features shared among state-of-the-art systems: iterative proof refinement, library search and context management. We evaluate this agentic approach using qualitatively different benchmarks and compare various frontier language models and design choices. Our results show competitive performance compared to state-of-the-art approaches, while using a significantly simpler architecture and a fraction of their cost. Additionally, we demonstrate consistent advantages of an iterative approach over multiple single-shot generations, especially in terms of sample efficiency and cost effectiveness. The implementation is released open-source as a candidate reference for future research and as an accessible prover for the community.

Borja Requena, Austin Letson, Krystian Nowakowski, Izan Beltran-Ferreiro, Leopoldo Sarra• 2026

Related benchmarks

Task	Dataset	Result
Formal Theorem Proving	PutnamBench	--	56
Theorem Proving	PutnamBench Lean	Solved Rate91	23
Formal Theorem Proving	Fate-H	Solve Rate66	7
Automated Theorem Proving	FATE-M	Pass Rate98	5
Automated Theorem Proving	Fate-X	Pass Rate24	5
Mathematical Theorem Proving	PutnamBench	Total Spend8.47e+3	3
Automated Theorem Proving	LeanCAT	Pass Rate59	2

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord