FormalEvolve: Neuro-Symbolic Evolutionary Search for Diverse and Prover-Effective Autoformalization
About
Autoformalization aims to translate natural-language mathematics into compilable, machine-checkable statements. However, semantic consistency does not imply prover effectiveness: even semantically consistent formalizations can differ substantially in proof-search cost and success rate. In this work, we formulate autoformalization as a budgeted, test-time search for semantically consistent repertoires, and propose FormalEvolve, a compilation-gated neuro-symbolic evolutionary framework. FormalEvolve generates diverse candidates via LLM-driven mutation and crossover with bounded patch repair, while symbolic Abstract Syntax Tree (AST) rewrite operations further inject structural diversity. On CombiBench and ProofNet, under a strict generator-call budget of T = 100, FormalEvolve reaches semantic hit rates (SH@100) of 58.0% and 84.9%, and reduces cross-problem concentration of semantic successes(lower Gini). Under a fixed prover budget, FormalEvolve also improves downstream proving performance on CombiBench. Code will be released publicly.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Statement generation | ProofNet N=186 (test) | CH@10098.4 | 11 | |
| Statement generation | CombiBench (N=100) | CH@1001 | 11 | |
| Autoformalization and Proving | CombiBench (N=100) | Pass@6444 | 4 | |
| Autoformalization and Proving | ProofNet N=186 (test) | Pass@640.6828 | 4 |