Teacher-Aware Evolution of Heuristic Programs from Learned Optimization Policies

About

LLM-based automatic heuristic design has shown promise for generating executable heuristics for combinatorial optimization, but existing methods mainly rely on delayed endpoint performance. We propose a \emph{teacher-aware evolutionary framework} that uses independently trained learned optimization policies as behavioral teachers. Instead of deploying or imitating the teacher, our method queries it on states visited by candidate heuristic programs and uses its action preferences as local feedback for evolution. The resulting search discovers static executable heuristics guided by both task performance and teacher-derived behavioral signals. Experiments on scheduling, routing, and graph optimization benchmarks show that our method improves over performance-driven LLM heuristic evolution baselines while requiring no neural inference at deployment. These results suggest that learned optimization policies can be repurposed as behavioral feedback sources for automatic heuristic discovery.

Minyu Chen, Song Qin, Ling-I Wu, Jianxin Xue, Guoqiang Li• 2026

Related benchmarks

Task	Dataset	Result
Traveling Salesman Problem	TSP50	--	77
Job-Shop Scheduling Problem	Random JSSP 10 × 10	Makespan915.9	9
Job-Shop Scheduling Problem	Random JSSP 15 × 15	Makespan1.38e+3	9
Job-Shop Scheduling Problem	Random JSSP 20 × 20	Makespan1.81e+3	9
Capacitated Vehicle Routing Problem	VRPLIB192 (generalization)	Total Route Length3.78e+4	5
Capacitated Vehicle Routing Problem	CVRP50 (in-distribution)	Route Length9.671	5
Capacitated Vehicle Routing Problem	CVRP200 (generalization)	Route Length25.371	5
Traveling Salesman Problem	TSP200	Tour Length12.25	5
Traveling Salesman Problem	TSPLIB70	Tour Length8.65e+4	5
MaxCut	BA200w	MaxCut Value189.8	4

Showing 10 of 16 rows

Other info

Follow for update

@wizwand_team Discord