Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

About

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitations. They typically act only on the current frame, while future prediction and object-aware reasoning are often learned in separate latent spaces. We propose OFlow (injecting Object-Aware Temporal Flow Matching into VLAs), a framework that addresses both limitations by unifying temporal foresight and object-aware reasoning in a shared semantic latent space. Our method forecasts future latents with temporal flow matching, factorizes them into object-aware representations that emphasize physically relevant cues while filtering task-irrelevant variation, and conditions continuous action generation on these predictions. By integrating OFlow into VLA pipelines, our method enables more reliable control under distribution shifts. Extensive experiments across LIBERO, LIBERO-Plus, MetaWorld, and SimplerEnv benchmarks and real-world tasks demonstrate that object-aware foresight consistently enhances robustness and success.

Kuanning Wang, Ke Fan, Chenhao Qiu, Zeyu Shangguan, Yuqian Fu, Yanwei Fu, Daniel Seita, Xiangyang Xue• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationLIBERO
Spatial Success Rate98
527
Robotic ManipulationLIBERO-Plus--
249
Robot ManipulationSimplerEnv WidowX
Success Rate: Put Spoon on Towel76.8
98
Robotic ManipulationMetaWorld MT50
Success Rate (Easy, 28 Tasks)93.6
5
Showing 4 of 4 rows

Other info

Follow for update