Planning in Branch-and-Bound: Model-Based Reinforcement Learning for Exact Combinatorial Optimization
About
Mixed-Integer Linear Programming (MILP) lies at the core of many real-world combinatorial optimization (CO) problems, traditionally solved by branch-and-bound (B&B). A key driver influencing B&B solvers efficiency is the variable selection heuristic that guides branching decisions. Looking to move beyond static, hand-crafted heuristics, recent work has explored adapting traditional reinforcement learning (RL) algorithms to the B&B setting, aiming to learn branching strategies tailored to specific MILP distributions. In parallel, RL agents have achieved remarkable success in board games, a very specific type of combinatorial problems, by leveraging environment simulators to plan via Monte Carlo Tree Search (MCTS). Building on these developments, we introduce Plan-and-Branch-and-Bound (PlanB&B), a model-based reinforcement learning (MBRL) agent that leverages a learned internal model of the B&B dynamics to discover improved branching strategies. Computational experiments empirically validate our approach, with our MBRL branching agent outperforming previous state-of-the-art RL methods across four standard MILP benchmarks.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Mixed Integer Linear Programming Branching | MILP Standard Suite (test) | Set Covering Time (s)0.87 | 10 | |
| Mixed Integer Linear Programming Branching | MILP Suite Transfer instances Standard | Set Covering Time (s)46.2 | 10 | |
| MILP Solving | Multiple knapsack (train) | Search Tree Nodes220 | 9 | |
| MILP Solving | Maximum independent set Transfer | Nodes Explored2.85e+3 | 9 | |
| MILP Solving | Multiple knapsack Transfer | Nodes Explored1.36e+4 | 9 | |
| MILP Solving | Combinatorial auction (train) | Nodes84.7 | 9 | |
| MILP Solving | Combinatorial auction Transfer | Nodes Explored1.67e+3 | 9 | |
| MILP Solving | Maximum independent set (train) | Nodes Explored44.8 | 9 | |
| MILP Solving | Set covering Transfer | Search Tree Nodes5.87e+3 | 9 | |
| MILP Solving | Set covering (train) | Search Tree Nodes186.2 | 9 |