Posterior Optimization with Clipped Objective for Bridging Efficiency and Stability in Generative Policy Learning

About

Expressive generative models have advanced robotic manipulation by capturing complex, multi-modal action distributions over temporally extended trajectories. However, fine-tuning these policies via RL remains challenging due to instability and sample inefficiency. We introduce Posterior Optimization with Clipped Objective (POCO), a principled RL framework that formulates policy improvement as a posterior inference problem tailored for temporal action chunks. Through an Expectation-Maximization procedure, POCO distills a reward-weighted implicit posterior into the policy without likelihood estimation. Furthermore, POCO adopts an offline-to-online paradigm that anchors online exploration to pre-trained priors, and its model-agnostic design scales to fine-tune large VLA models without architectural modifications. Evaluations across 7 simulation benchmarks and 4 contact-rich real-world tasks demonstrate that POCO prevents catastrophic policy collapse, outperforms SOTA baselines, and achieves a 96.7% success rate on real-world tasks. Videos are available at our project website https://cccedric.github.io/poco/.

Yuhui Chen, Haoran Li, Zhennan Jiang, Yuxing Qin, Yuxuan Wan, Weiheng Liu, Dongbin Zhao• 2026

Related benchmarks

Task	Dataset	Result
Assemble SSD	Real-world Robotic Manipulation Assemble SSD	Success Rate96.7	4
Hang Keychain	Hang Keychain Real-world	Success Rate86.7	4
Insert USB	Real-world Robotic Manipulation (Insert USB)	Success Rate90	4
Pick Cube	Real-world Robotic Manipulation Pick Cube	Success Rate100	4
Pick Pen	Pick Pen Real-world	Success Rate93.3	4
Route Cable	Real-world Robotic Manipulation Route Cable	Success Rate100	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord