Long-Horizon Manipulation via Trace-Conditioned VLA Planning
About
Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step, progress-dependent, and brittle to compounding execution errors. We present LoHo-Manip, a modular framework that scales short-horizon VLA execution to long-horizon instruction following via a dedicated task-management VLM. The manager is decoupled from the executor and is invoked in a receding-horizon manner: given the current observation, it predicts a progress-aware remaining plan that combines (i) a subtask sequence with an explicit done + remaining split as lightweight language memory, and (ii) a visual trace -- a compact 2D keypoint trajectory prompt specifying where to go and what to approach next. The executor VLA is adapted to condition on the rendered trace, thereby turning long-horizon decision-making into repeated local control by following the trace. Crucially, predicting the remaining plan at each step yields an implicit closed loop: failed steps persist in subsequent outputs, and traces update accordingly, enabling automatic continuation and replanning without hand-crafted recovery logic or brittle visual-history buffers. Extensive experiments spanning embodied planning, long-horizon reasoning, trajectory prediction, and end-to-end manipulation in simulation and on a real Franka robot demonstrate strong gains in long-horizon success, robustness, and out-of-distribution generalization. Project page: https://www.liuisabella.com/LoHoManip
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robot Manipulation | LIBERO | Object Achievement98.6 | 957 | |
| Embodied AI Task Planning | EB-ALFRED | Average Score38 | 72 | |
| Embodied AI | EmbodiedBench EB-Habitat | Base Score78 | 53 | |
| Egocentric daily-task planning | EgoPlanBench2 | Overall Success Rate56.7 | 44 | |
| Long-horizon reasoning for robotic manipulation | RoboVQA | B-1 Score75.1 | 28 | |
| Trajectory Prediction | ShareRobot-T | DFD0.2309 | 5 | |
| Trajectory Prediction | VABench-V | DFD0.2123 | 5 | |
| Robotic Manipulation Reasoning | VLABench | In Distribution Accuracy54 | 3 |