Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rethinking Efficiency in Neural Combinatorial Optimization: Batched Preference Optimization with Mamba

About

We study efficiency as a first-class objective in Neural Combinatorial Optimization (NCO) and present ECO, an efficient learning framework that combines batched preference optimization with a Mamba backbone. Instead of tightly interleaving every policy update with on-policy rollouts, ECO decouples trajectory generation from gradient updates through two stages: supervised warm-up on pre-computed solutions and iterative Direct Preference Optimization (DPO) on batched candidate sets generated by the current policy. We pair this learning pipeline with a mixed Mamba encoder-decoder that reduces memory growth on long sequences and improves hardware utilization. A local-search-guided bootstrapping strategy is further used during training to widen preference margins and stabilize iterative improvement. Importantly, local search is only used to construct stronger preference pairs during training and is never invoked at inference time. On TSP and CVRP, ECO achieves the strongest overall performance among the compared neural baselines while also delivering clear advantages in memory usage and throughput. We provide additional analysis on memory scaling, throughput, and the contribution of each design component.

Zhenxing Xu, Zeyuan Ma, Weidong Bao, Yan Zheng, Ji Wang, Zhiguang Cao• 2026

Related benchmarks

TaskDatasetResultRank
Capacitated Vehicle Routing ProblemCVRP N=100
Objective Value15.69
87
Traveling Salesman ProblemTSP-200
Optimality Gap0.37
41
Traveling Salesman ProblemTSP-500
Solution Length16.98
38
Capacitated Vehicle Routing ProblemCVRP-200
Objective Value20.13
35
Capacitated Vehicle Routing ProblemCVRP 1000
Objective Value65.08
29
Capacitated Vehicle Routing ProblemCVRP500
Objective Value38.21
25
Traveling Salesperson ProblemTSP1000
Objective Value24.24
8
Traveling Salesperson ProblemTSP5000
Objective Value53.76
8
Showing 8 of 8 rows

Other info

Follow for update