COAgents: Multi-Agent Framework to Learn and Navigate Routing Problems Search Space
About
Although Vehicle Routing Problems (VRP) are essential to many real-world systems, they remain computationally intractable at scale due to their combinatorial complexity. Traditional heuristics rely on handcrafted rules for local improvements and occasional \textit{jumps} to escape local minima, but often struggle to generalize across diverse instances. We introduce \textbf{COAgents}, a cooperative multi-agent framework that models the search process as a graph: nodes represent solutions, and edges correspond to either local refinements or large perturbations for diversification (i.e., jumps). A \textit{Partial Search Graph} (PSG) is dynamically constructed during search, enabling COAgents to train a Node Selection Agent and a Move Selection Agent to guide intensification, and a Jump Agent to trigger well-timed explorations of new regions. Unlike end-to-end learning approaches, COAgents cleanly separates problem-agnostic search control from compact domain-specific encoding, facilitating adaptability across tasks. Extensive experiments on the CVRP and VRPTW benchmarks show that COAgents remains competitive with several learn-to-search baselines on CVRP and sets a new state of the art among learning-based methods on the more challenging VRPTW instances, reducing the gap to the best-known solutions by 14\% at $N\!=\!100$ and 44\% at $N\!=\!50$ relative to the strongest neural solver (POMO), and by 21\% and 40\% respectively relative to ALNS. Code is available at https://github.com/mahdims/COAgents.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Capacitated Vehicle Routing Problem | CVRP N=100 10,000 instances (test) | Objective Value16.05 | 44 | |
| Capacitated Vehicle Routing Problem | CVRP N=20 10,000 instances (test) | Objective Value6.18 | 38 | |
| Capacitated Vehicle Routing Problem | CVRP N=50 10,000 instances (test) | Objective Value10.6 | 29 | |
| Vehicle Routing Problem with Time Windows | VRPTW 1k N=50 (test) | Objective Value14.77 | 9 | |
| Vehicle Routing Problem with Time Windows | VRPTW 1k N=100 (test) | Objective Value25.26 | 9 |