Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LLM-Guided Future Hypotheses for Horizon-Aware Exploration in Multi-Step Robot Manipulation

About

Multi-step robot manipulation requires acting under uncertainty about how the scene will evolve, making exploration and policy adaptation challenging. We study whether short-horizon, task-consistent future videos can provide useful structured priors for control and reinforcement-learning fine-tuning. We formalize this idea through Future-Experience Conditioning (FEC), a simple interface that conditions closed-loop policies on a latent representation of a short future video. In our simulation setup, future clips are generated in three stages, an LLM reasoner operating over a task ontology initialized from the current scene state, a robot-free digital-twin rollout of the intended object motion, and a mask-free video diffusion model that synthesizes a robot-consistent future clip without requiring segmentation at inference. We instantiate this future-conditioning interface primarily with BC and BC+RL, and compare against a future-conditioned Streaming Flow Policy (SFP) baseline on RoboCasa and CALVIN under NoFuture, GTFuture, GenFuture, and WrongFuture. Generated futures improve performance over no-future conditioning, while mismatched futures degrade it, and our BC+RL instantiation achieves the strongest overall results. An average BC+RL learning-curve analysis across 8 CALVIN tasks further shows that GTFuture improves fastest, GenFuture improves earlier and to a higher level than NoFuture, and WrongFuture remains at zero throughout training. These results suggest that short-horizon future videos can serve as useful structured priors for exploration and policy adaptation under imperfect future predictions. https://enact2026.github.io/

Mohammad Khoshnazar, Andrew Melnik, Michael Beetz• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationRoboCasa CloseDrawer
Success Rate82.3
12
turn off lightbulbCALVIN
Success Rate100
10
open drawerCALVIN
Success Rate100
7
turn on LEDCALVIN
Success Rate100
7
Close DrawerCALVIN
Success Rate96.1
7
turn on lightbulbCALVIN
Success Rate100
7
move slider leftCALVIN
Success Rate50
7
push_into_drawerCALVIN
Success Rate66.7
4
turn_off_ledCALVIN
Success Rate100
4
OpenSingleDoorRoboCasa
NoF Count46.1
3
Showing 10 of 10 rows

Other info

Follow for update