Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories

About

Combinatorial optimisation problems framed as mixed integer linear programmes (MILPs) are ubiquitous across a range of real-world applications. The canonical branch-and-bound algorithm seeks to exactly solve MILPs by constructing a search tree of increasingly constrained sub-problems. In practice, its solving time performance is dependent on heuristics, such as the choice of the next variable to constrain ('branching'). Recently, machine learning (ML) has emerged as a promising paradigm for branching. However, prior works have struggled to apply reinforcement learning (RL), citing sparse rewards, difficult exploration, and partial observability as significant challenges. Instead, leading ML methodologies resort to approximating high quality handcrafted heuristics with imitation learning (IL), which precludes the discovery of novel policies and requires expensive data labelling. In this work, we propose retro branching; a simple yet effective approach to RL for branching. By retrospectively deconstructing the search tree into multiple paths each contained within a sub-tree, we enable the agent to learn from shorter trajectories with more predictable next states. In experiments on four combinatorial tasks, our approach enables learning-to-branch without any expert guidance or pre-training. We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables, with ablations verifying that our retrospectively constructed trajectories are essential to achieving these results.

Christopher W. F. Parsonson, Alexandre Laterre, Thomas D. Barrett• 2022

Related benchmarks

TaskDatasetResultRank
Mixed Integer Linear Programming BranchingMILP Standard Suite (test)
Set Covering Time (s)1.14
10
Mixed Integer Linear Programming BranchingMILP Suite Transfer instances Standard
Set Covering Time (s)59.4
10
MILP SolvingMultiple knapsack (train)
Search Tree Nodes250.3
9
MILP SolvingMultiple knapsack Transfer
Nodes Explored2.71e+4
9
MILP SolvingSet covering (train)
Search Tree Nodes183
9
MILP SolvingSet covering Transfer
Search Tree Nodes6.10e+3
9
MILP SolvingCombinatorial auction (train)
Nodes103.2
9
MILP SolvingCombinatorial auction Transfer
Nodes Explored2.91e+3
9
MILP SolvingMaximum independent set (train)
Nodes Explored223
9
MILP SolvingMaximum independent set Transfer
Nodes Explored1.19e+5
9
Showing 10 of 10 rows

Other info

Follow for update