Combinatorial Optimization with Policy Adaptation using Latent Space Search

About

Combinatorial Optimization underpins many real-world applications and yet, designing performant algorithms to solve these complex, typically NP-hard, problems remains a significant research challenge. Reinforcement Learning (RL) provides a versatile framework for designing heuristics across a broad spectrum of problem domains. However, despite notable progress, RL has not yet supplanted industrial solvers as the go-to solution. Current approaches emphasize pre-training heuristics that construct solutions but often rely on search procedures with limited variance, such as stochastically sampling numerous solutions from a single policy or employing computationally expensive fine-tuning of the policy on individual problem instances. Building on the intuition that performant search at inference time should be anticipated during pre-training, we propose COMPASS, a novel RL approach that parameterizes a distribution of diverse and specialized policies conditioned on a continuous latent space. We evaluate COMPASS across three canonical problems - Travelling Salesman, Capacitated Vehicle Routing, and Job-Shop Scheduling - and demonstrate that our search strategy (i) outperforms state-of-the-art approaches on 11 standard benchmarking tasks and (ii) generalizes better, surpassing all other approaches on a set of 18 procedurally transformed instance distributions.

Felix Chalumeau, Shikha Surana, Clement Bonnet, Nathan Grinsztajn, Arnu Pretorius, Alexandre Laterre, Thomas D. Barrett• 2023

Related benchmarks

Task	Dataset	Result
Math Reasoning	AMC	Accuracy6.8	95
Math Reasoning	MATH 500	Accuracy17	60
Capacitated Vehicle Routing Problem	CVRP procedurally transformed instances Mu-transformed (test)	Objective Value13.534	56
Traveling Salesman Problem	TSP procedurally transformed instances Mu 0-9 (test)	Objective Value6.738	50
Traveling Salesman Problem	TSP-500	Solution Length16.81	38
Traveling Salesperson Problem	TSP N=200 (Generalization (128 instances))	Optimality Gap0.348	35
Traveling Salesman Problem	TSP procedurally transformed instances Mu-transformed (test)	Objective Value6.738	26
Traveling Salesperson Problem	TSP n=100 (train)	Objective Value7.765	26
Traveling Salesman Problem (TSP)	TSP n=150 Generalization 1K instances	Objective Value9.354	25
Capacitated Vehicle Routing Problem (CVRP)	CVRP n=150 1K instances (Generalization)	Objective Value19.313	18

Showing 10 of 24 rows

Other info

Code

Follow for update

@wizwand_team Discord