Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm

About

We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs.

Zhenxing Xu, Zeyuan Ma, Weidong Bao, Hui Yan, Yan Zheng, Ji Wang• 2026

Related benchmarks

TaskDatasetResultRank
Capacitated Vehicle Routing ProblemCVRP N=100
Objective Value15.69
50
Traveling Salesman ProblemTSP-500
Solution Length16.98
32
Traveling Salesman ProblemTSP-200
Optimality Gap0.37
28
Capacitated Vehicle Routing ProblemCVRP-200
Objective Value20.13
20
Traveling Salesperson ProblemTSP1000
Objective Value24.24
8
Traveling Salesperson ProblemTSP5000
Objective Value53.76
8
Capacitated Vehicle Routing ProblemCVRP 1000
Objective Value65.08
7
Capacitated Vehicle Routing ProblemCVRP500
Objective Value38.21
7
Showing 8 of 8 rows

Other info

Follow for update