Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning 2-opt Heuristics for the Traveling Salesman Problem via Deep Reinforcement Learning

About

Recent works using deep learning to solve the Traveling Salesman Problem (TSP) have focused on learning construction heuristics. Such approaches find TSP solutions of good quality but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which unlike previous works, can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions at a faster rate than previous state-of-the-art deep learning methods.

Paulo R. de O. da Costa, Jason Rhuggenaath, Yingqian Zhang, Alp Akcay• 2020

Related benchmarks

TaskDatasetResultRank
Traveling Salesman Problem (TSP)TSP n=100 10K instances (test)
Objective Value7.79
52
Traveling Salesperson ProblemTSP n=100 (train)
Objective Value7.87
26
Traveling Salesman ProblemTSP N=20 10,000 instances (test)
Objective Value3.83
16
Traveling Salesman ProblemTSP N=50 10,000 instances (test)
Objective Value5.7
16
Traveling Salesman ProblemUniform Euclidean TSP n = 50
Tour Cost5.9109
15
Traveling Salesman ProblemUniform Euclidean TSP n = 100
Solution Cost7.8201
15
Traveling Salesman ProblemTSPLIB-gen n = 100
Cost5.9674
15
Traveling Salesman ProblemUniform Euclidean TSP n = 500
Tour Cost (N=500)18.5447
15
Traveling Salesperson ProblemTSPLIB Real-world instances 1.0
Optimality Gap (%)0.0023
12
Capacitated Vehicle Routing ProblemCVRP n=100 (train)
Objective Value16.03
7
Showing 10 of 11 rows

Other info

Follow for update