PISTO: Proximal Inference for Stochastic Trajectory Optimization

About

Stochastic trajectory optimization methods like STOMP enable planning with non-differentiable costs, offering substantial flexibility over gradient-based approaches. We show that STOMP implicitly minimizes the KL divergence from a Boltzmann trajectory distribution, revealing an elegant Variational Inference (VI) structure underlying its updates. Building on this insight, we propose the \textit{Proximal Inference for Stochastic Trajectory Optimization} (PISTO) algorithm that stabilizes the updates by augmenting the objective with a KL regularization between successive Gaussian proposals. This proximal formulation admits a trust-region interpretation and yields closed-form mean updates computable as expectations under a surrogate distribution. We estimate these expectations via importance-weighted Monte Carlo sampling, producing a simple, derivative-free algorithm that inherits STOMP's ability to handle non-differentiable and discontinuous costs without modification. On robot arm motion planning benchmarks, PISTO achieves an 89\% success rate -- outperforming CHOMP (63\%) and STOMP (68\%) -- while producing shorter, smoother paths at twice the speed of competing stochastic methods. We further validate PISTO on contact-rich MuJoCo locomotion and manipulation tasks, where it consistently outperforms both CEM and MPPI baselines in reward.

Hongzhe Yu, Zinuo Chang, Yongxin Chen• 2026

Related benchmarks

Task	Dataset	Result
Trajectory Optimization	Push T	Time (s)134.6	8
Trajectory Optimization	Walker2D	Computational Time (s)65.99	8
Trajectory Optimization	Humanoid Standup	Computational Time (s)65.29	8
Motion Planning	7-DOF Manipulator	Success Rate88.57	4
Trajectory Optimization	Hopper	Reward (Per Step)1.2645	3
Trajectory Optimization	HumanoidRun	Cumulative Reward per Step1.3385	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord