Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PISTO: Proximal Inference for Stochastic Trajectory Optimization

About

Stochastic trajectory optimization methods like STOMP enable planning with non-differentiable costs, offering substantial flexibility over gradient-based approaches. We show that STOMP implicitly minimizes the KL divergence from a Boltzmann trajectory distribution, revealing an elegant Variational Inference (VI) structure underlying its updates. Building on this insight, we propose the \textit{Proximal Inference for Stochastic Trajectory Optimization} (PISTO) algorithm that stabilizes the updates by augmenting the objective with a KL regularization between successive Gaussian proposals. This proximal formulation admits a trust-region interpretation and yields closed-form mean updates computable as expectations under a surrogate distribution. We estimate these expectations via importance-weighted Monte Carlo sampling, producing a simple, derivative-free algorithm that inherits STOMP's ability to handle non-differentiable and discontinuous costs without modification. On robot arm motion planning benchmarks, PISTO achieves an 89\% success rate -- outperforming CHOMP (63\%) and STOMP (68\%) -- while producing shorter, smoother paths at twice the speed of competing stochastic methods. We further validate PISTO on contact-rich MuJoCo locomotion and manipulation tasks, where it consistently outperforms both CEM and MPPI baselines in reward.

Hongzhe Yu, Zinuo Chang, Yongxin Chen• 2026

Related benchmarks

TaskDatasetResultRank
Trajectory OptimizationPush T
Time (s)134.6
8
Trajectory OptimizationWalker2D
Computational Time (s)65.99
8
Trajectory OptimizationHumanoid Standup
Computational Time (s)65.29
8
Motion Planning7-DOF Manipulator
Success Rate88.57
4
Trajectory OptimizationHopper
Reward (Per Step)1.2645
3
Trajectory OptimizationHumanoidRun
Cumulative Reward per Step1.3385
3
Showing 6 of 6 rows

Other info

Follow for update