CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization
About
Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a fundamental objective mismatch: minimizing imitation loss (e.g., cross-entropy) does not guarantee solution cost minimization. We dissect this mismatch into two deficiencies: Decoder-Blindness (being oblivious to the non-differentiable decoding process) and Cost-Blindness (prioritizing structural imitation over solution quality). We empirically demonstrate that these intrinsic flaws impose a hard performance ceiling. To overcome this limitation, we propose CADO (Cost-Aware Diffusion models for Optimization), a streamlined Reinforcement Learning fine-tuning framework that formulates the diffusion denoising process as an MDP to directly optimize the post-decoded solution cost. We introduce Label-Centered Reward, which repurposes ground-truth labels as unbiased baselines rather than imitation targets, and Hybrid Fine-Tuning for parameter-efficient adaptation. CADO achieves state-of-the-art performance across diverse benchmarks, validating that objective alignment is essential for unlocking the full potential of heatmap-based solvers.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Traveling Salesman Problem | TSP-100 | Optimality Drop0.06 | 53 | |
| Maximum Independent Set | ER [700-800] | Solution Size43.62 | 48 | |
| Traveling Salesperson Problem | TSP-100 | Solution Length7.76 | 42 | |
| Traveling Salesman Problem | TSP-500 | Solution Length16.65 | 32 | |
| Traveling Salesperson Problem | TSP-1k | Solution Length23.32 | 31 | |
| Maximum Independent Set | SATLIB | MIS Size425.4 | 23 | |
| Traveling Salesman Problem | TSPLIB 50-200 | Drop0.46 | 10 | |
| Traveling Salesperson Problem | TSP-500 | Solution Length16.65 | 9 | |
| Traveling Salesman Problem | TSP-10k | Tour Length73.69 | 9 | |
| Traveling Salesperson Problem | TSP-10k | Solution Length73.69 | 2 |