Distribution-Aware Algorithm Design with LLM Agents

About

Many optimization problems arise repeatedly from a fixed but unknown distribution. Even when the worst-case problem is hard, this distribution may carry reusable structure, such as recurring geometry, decompositions, or resource patterns. We study how to infer such structure from sample instances and compile it into solver code that runs faster on future instances while preserving solution quality. Our central abstraction is a \emph{solver hint}: distribution-specific structure inferred from samples and used to specialize a solver. We prove that the empirically fastest sample-consistent solver generalizes in both correctness and runtime over fixed solver libraries, and that identifiable hints can be recovered from polynomially many samples. We instantiate the framework with LLM code agents on $21$ combinatorial-optimization distributions across $7$ problem classes. The synthesized solvers reach mean normalized quality $0.971$ while running orders of magnitude faster than classical heuristics, Gurobi, and time-limited exact backends, though they do not dominate every baseline on every family. Against LLM synthesis baselines, they are faster than one-shot Codex, one-shot Claude Code, and a best-of-$5$ open-model variant; they improve quality over Claude Code and best-of-$5$, while nearly matching Codex quality and running substantially faster. This isolates the contribution of the iterative synthesis loop without claiming uniform domination over every LLM baseline. On the PACE 2025 Dominating Set private instances, the synthesized solver is valid on all $100$ graphs and runs roughly $75\times$--$125\times$ faster than released competition solvers, within a few percent of their solution size. These results suggest LLM agents can discover distribution-specific computational shortcuts and compile them into efficient solver code.

Saharsh Koganti, Priyadarsi Mishra, Pierfrancesco Beneventano, Tomer Galanti• 2026

Related benchmarks

Task	Dataset	Result
Dominating Set	PACE Dominating Set 2025 (private instances)	Validity Score100	5
Dominating Set	PACE 2025	Validity Score100	5
Combinatorial Optimization	21 target distributions All (test)	Q_LLM0.971	1
Graph Coloring	21 target distributions Coloring (test)	Quality Score (LLM)86.8	1
Maximum Independent Set	21 target distributions MIS (test)	Q Score (LLM)99.2	1
Maximum Satisfiability	21 target distributions MAXSAT (test)	Q_LLM1	1
Minimum Dominating Set	MDS 21 target distributions (test)	Q_LLM97.3	1
Multi-dimensional Knapsack Problem	MDKP 21 target distributions (test)	Q (LLM)0.973	1
Packing Linear Programming	21 target distributions Packing LP (test)	Q_LLM0.994	1
Traveling Salesperson Problem	TSP 21 target distributions (test)	Q_LLM Score0.993	1

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord