ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

About

LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over constrained DSLs. The resulting solvers require no LLM calls at test time and are strong standalone systems: symbolic solver ensembles reach 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming LLMs with test-time scaling for the latter by +16.3 percentage points at zero LLM inference cost. They also complement LLM search, improving PBEBench-Hard accuracy from 68.4% to 85.8% while reducing reported token usage by 78%, and raising SLR-Bench hard-tier accuracy from 34.4% to 58.0% in a neuro-symbolic hybrid setting. Compared to directly using coding agents as per-instance solvers, induced solvers are substantially more Pareto-efficient, amortizing a small one-time construction cost over many zero-token executions. Finally, most solvers transfer zero-shot to a real historical linguistics task - predicting sound changes in natural language data - reaching 80.1% accuracy under ensembling and recovering some plausible linguistic rules. Together, these results show that reasoning traces can be compiled into reusable symbolic solvers that solve many tasks directly, complement LLM inference on hard cases, and provide a scalable route to domain-general solver induction. We release code and data for reproducibility.

Atharva Naik, Yash Mathur, Prakam, Carolyn Rose, David Mortensen• 2026

Related benchmarks

Task	Dataset	Result
String transformation	PBEBench Lite	Accuracy93.9	15
inductive Prolog rule synthesis	SLR-Bench Overall 1,000 tasks (full)	Accuracy (%)86.7	13
inductive Prolog rule synthesis	SLR-Bench Medium tier 250 tasks 1	Accuracy88.8	13
inductive Prolog rule synthesis	SLR-Bench Hard tier 250 tasks 1	Accuracy58.4	13
inductive Prolog rule synthesis	SLR-Bench Easy tier 1 (250 tasks)	Accuracy100	13
inductive Prolog rule synthesis	SLR-Bench Basic tier 250 tasks 1	Accuracy100	13
Long-horizon cascade synthesis	PBEBench Hard	Accuracy85.8	10

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord