Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ReaComp: Compiling LLM Reasoning into Symbolic Solvers for Efficient Program Synthesis

About

LLMs can solve program synthesis tasks but remain inefficient and unreliable on hard instances requiring large combinatorial search. Given a small set of reasoning traces, we use coding agents to compile them into reusable symbolic program synthesizers over constrained DSLs. The resulting solvers require no LLM calls at test time and are strong standalone systems: symbolic solver ensembles reach 91.3% accuracy on PBEBench-Lite and 84.7% on PBEBench-Hard, outperforming LLMs with test-time scaling for the latter by +16.3 percentage points at zero LLM inference cost. They also complement LLM search, improving PBEBench-Hard accuracy from 68.4% to 85.8% while reducing reported token usage by 78%, and raising SLR-Bench hard-tier accuracy from 34.4% to 58.0% in a neuro-symbolic hybrid setting. Compared to directly using coding agents as per-instance solvers, induced solvers are substantially more Pareto-efficient, amortizing a small one-time construction cost over many zero-token executions. Finally, most solvers transfer zero-shot to a real historical linguistics task - predicting sound changes in natural language data - reaching 80.1% accuracy under ensembling and recovering some plausible linguistic rules. Together, these results show that reasoning traces can be compiled into reusable symbolic solvers that solve many tasks directly, complement LLM inference on hard cases, and provide a scalable route to domain-general solver induction. We release code and data for reproducibility.

Atharva Naik, Yash Mathur, Prakam, Carolyn Rose, David Mortensen• 2026

Related benchmarks

TaskDatasetResultRank
String transformationPBEBench Lite
Accuracy93.9
15
inductive Prolog rule synthesisSLR-Bench Overall 1,000 tasks (full)
Accuracy (%)86.7
13
inductive Prolog rule synthesisSLR-Bench Medium tier 250 tasks 1
Accuracy88.8
13
inductive Prolog rule synthesisSLR-Bench Hard tier 250 tasks 1
Accuracy58.4
13
inductive Prolog rule synthesisSLR-Bench Easy tier 1 (250 tasks)
Accuracy100
13
inductive Prolog rule synthesisSLR-Bench Basic tier 250 tasks 1
Accuracy100
13
Long-horizon cascade synthesisPBEBench Hard
Accuracy85.8
10
Showing 7 of 7 rows

Other info

Follow for update