Conservative Offline Robot Policy Learning via Posterior-Transition Reweighting
About
Offline post-training adapts a pretrained robot policy to a target dataset by supervised regression on recorded actions. In practice, robot datasets are heterogeneous: they mix embodiments, camera setups, and demonstrations of varying quality, so many trajectories reflect recovery behavior, inconsistent operator skill, or weakly informative supervision. Uniform post-training gives equal credit to all samples and can therefore average over conflicting or low-attribution data. We propose Posterior-Transition Reweighting (PTR), a reward-free and conservative post-training method that decides how much each training sample should influence the supervised update. For each sample, PTR encodes the observed post-action consequence as a latent target, inserts it into a candidate pool of mismatched targets, and uses a separate transition scorer to estimate a softmax identification posterior over target indices. The posterior-to-uniform ratio defines the PTR score, which is converted into a clipped-and-mixed weight and applied to the original action objective through self-normalized weighted regression. This construction requires no tractable policy likelihood and is compatible with both diffusion and flow-matching action heads. Rather than uniformly trusting all recorded supervision, PTR reallocates credit according to how attributable each sample's post-action consequence is under the current representation, improving conservative offline adaptation to heterogeneous robot data.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | LIBERO (test) | Average Success Rate97.8 | 184 | |
| Robot Manipulation | LIBERO simulation | Average Success Rate97.8 | 36 | |
| Robot Manipulation | RoboCasa simulation | Average Success Rate55.6 | 7 | |
| Bimanual Coordination | Real Robot Bimanual Suite | Success Rate66.7 | 4 | |
| Multi-step Sequential Manipulation | Real Robot Long-Horizon Suite | Success Rate65 | 4 | |
| Precise Placement and Arrangement | Real Robot Spatial Suite | Success Rate78.3 | 4 | |
| Robust Manipulation | Real Robot Robust Suite | Success Rate61.7 | 4 | |
| Robot Manipulation | RoboCasa (test) | Pick and Place Success Rate38.3 | 3 |