Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Latent Policy Steering through One-Step Flow Policies

About

Offline reinforcement learning (RL) allows robots to learn from offline datasets without risky exploration. Yet, offline RL's performance often hinges on a brittle trade-off between (1) return maximization, which can push policies outside the dataset support, and (2) behavioral constraints, which typically require sensitive hyperparameter tuning. Latent steering offers a structural way to stay within the dataset support during RL, but existing offline adaptations commonly approximate action values using latent-space critics learned via indirect distillation, which can lose information and hinder convergence. We propose Latent Policy Steering (LPS), which enables high-fidelity latent policy improvement by backpropagating original-action-space Q-gradients through a differentiable one-step MeanFlow policy to update a latent-action-space actor. By eliminating proxy latent critics, LPS allows an original-action-space critic to guide end-to-end latent-space optimization, while the one-step MeanFlow policy serves as a behavior-constrained generative prior. This decoupling yields a robust method that works out-of-the-box with minimal tuning. Across OGBench and real-world robotic tasks, LPS achieves state-of-the-art performance and consistently outperforms behavioral cloning and strong latent steering baselines.

Hokyun Im, Andrey Kolobov, Jianlong Fu, Youngwoon Lee• 2026

Related benchmarks

TaskDatasetResultRank
Robot goal-reaching success rate evaluationOGBench cube-double-play-singletask
Success Rate (%)41
13
Robot goal-reaching success rate evaluationOGBench puzzle-3x3-play-sparse-singletask
Success Rate100
13
Robot goal-reaching success rate evaluationOGBench scene-play-sparse-singletask
Success Rate79
13
Robot goal-reaching success rate evaluationOGBench cube-single-play-singletask
Success Rate95
13
Robot goal-reaching success rate evaluationOGBench puzzle-4x4-play-sparse-singletask
Success Rate22
13
Robot goal-reaching success rate evaluationOGBench visual-*-task1
Success Rate48
5
plug in bulbReal-world
Success Rate35
4
pnp carrotsReal-world
Success Rate85
4
Put Eggplant To BowlReal-world
Success Rate80
4
refill tapeReal-world
Success Rate25
4
Showing 10 of 10 rows

Other info

Follow for update