Rethink Efficiency Side of Neural Combinatorial Solver: An Offline and Self-Play Paradigm
About
We propose ECO, a versatile learning paradigm that enables efficient offline self-play for Neural Combinatorial Optimization (NCO). ECO addresses key limitations in the field through: 1) Paradigm Shift: Moving beyond inefficient online paradigms, we introduce a two-phase offline paradigm consisting of supervised warm-up and iterative Direct Preference Optimization (DPO); 2) Architecture Shift: We deliberately design a Mamba-based architecture to further enhance the efficiency in the offline paradigm; and 3) Progressive Bootstrapping: To stabilize training, we employ a heuristic-based bootstrapping mechanism that ensures continuous policy improvement during training. Comparison results on TSP and CVRP highlight that ECO performs competitively with up-to-date baselines, with significant advantage on the efficiency side in terms of memory utilization and training throughput. We provide further in-depth analysis on the efficiency, throughput and memory usage of ECO. Ablation studies show rationale behind our designs.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Capacitated Vehicle Routing Problem | CVRP N=100 | Objective Value15.69 | 50 | |
| Traveling Salesman Problem | TSP-500 | Solution Length16.98 | 32 | |
| Traveling Salesman Problem | TSP-200 | Optimality Gap0.37 | 28 | |
| Capacitated Vehicle Routing Problem | CVRP-200 | Objective Value20.13 | 20 | |
| Traveling Salesperson Problem | TSP1000 | Objective Value24.24 | 8 | |
| Traveling Salesperson Problem | TSP5000 | Objective Value53.76 | 8 | |
| Capacitated Vehicle Routing Problem | CVRP 1000 | Objective Value65.08 | 7 | |
| Capacitated Vehicle Routing Problem | CVRP500 | Objective Value38.21 | 7 |