Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chain-of-Context Learning: Dynamic Constraint Understanding for Multi-Task VRPs

About

Multi-task Vehicle Routing Problems (VRPs) aim to minimize routing costs while satisfying diverse constraints. Existing solvers typically adopt a unified reinforcement learning (RL) framework to learn generalizable patterns across tasks. However, they often overlook the constraint and node dynamics during the decision process, making the model fail to accurately react to the current context. To address this limitation, we propose Chain-of-Context Learning (CCL), a novel framework that progressively captures the evolving context to guide fine-grained node adaptation. Specifically, CCL constructs step-wise contextual information via a Relevance-Guided Context Reformulation (RGCR) module, which adaptively prioritizes salient constraints. This context then guides node updates through a Trajectory-Shared Node Re-embedding (TSNR) module, which aggregates shared node features from all trajectories' contexts and uses them to update inputs for the next step. By modeling evolving preferences of the RL agent, CCL captures step-by-step dependencies in sequential decision-making. We evaluate CCL on 48 diverse VRP variants, including 16 in-distribution and 32 out-of-distribution (with unseen constraints) tasks. Experimental results show that CCL performs favorably against the state-of-the-art baselines, achieving the best performance on all in-distribution tasks and the majority of out-of-distribution tasks.

Shuangchun Gui, Suyu Liu, Xuehe Wang, Zhiguang Cao• 2026

Related benchmarks

TaskDatasetResultRank
Capacitated Vehicle Routing ProblemCVRP N=100
Objective Value15.787
73
Vehicle Routing Problem with Time WindowsVRPTW 100 customers
Objective Value25.862
24
Capacitated Vehicle Routing ProblemCVRP N=50
Objective Value10.463
17
Vehicle Routing ProblemOVRP n=100
Time (m)0.2833
17
Open Vehicle Routing ProblemOVRP50
Objective Value6.61
12
Vehicle Routing ProblemOOD VRP Variants (No Time Windows), N=50
Objective Value8.673
5
Vehicle Routing ProblemVRP Variants OOD With Time Windows N=50
Objective Value13.536
5
Vehicle Routing ProblemOOD VRP Variants With Time Windows N=100
Objective Value23.375
5
Vehicle Routing Problem with Time WindowsHomberger and Gehring VRPTW N=600 (test)
Objective Value2.06e+4
3
Showing 9 of 9 rows

Other info

Follow for update