Accelerating Quadratic Optimization with Reinforcement Learning

About

First-order methods for quadratic optimization such as OSQP are widely used for large-scale machine learning and embedded optimal control, where many related problems must be rapidly solved. These methods face two persistent challenges: manual hyperparameter tuning and convergence time to high-accuracy solutions. To address these, we explore how Reinforcement Learning (RL) can learn a policy to tune parameters to accelerate convergence. In experiments with well-known QP benchmarks we find that our RL policy, RLQP, significantly outperforms state-of-the-art QP solvers by up to 3x. RLQP generalizes surprisingly well to previously unseen problems with varying dimension and structure from different applications, including the QPLIB, Netlib LP and Maros-Meszaros problems. Code for RLQP is available at https://github.com/berkeleyautomation/rlqp.

Jeffrey Ichnowski, Paras Jain, Bartolomeo Stellato, Goran Banjac, Michael Luo, Francesco Borrelli, Joseph E. Gonzalez, Ion Stoica, Ken Goldberg• 2021

Related benchmarks

Task	Dataset	Result	Rank
Quadratic Programming	Maros & Mészáros	Solve Time0.00e+0		52
Quadratic Program Solving	QPLIB 15 (test)	Solving Time (s)0.113		24

Showing 2 of 2 rows

Other info

Code

Follow for update

@wizwand_team Discord