Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

STARRY: Spatial-Temporal Action-Centric World Modeling for Robotic Manipulation

About

Robotic manipulation requires reasoning about future spatial-temporal interactions and geometric constraints, yet existing Vision-Language-Action (VLA) policies often leave predictive representation weakly coupled with action execution, causing failures in tasks requiring precise spatial-temporal coordination. We propose STARRY, a world-model-enhanced action-generation policy that aligns spatial-temporal prediction and action generation by jointly denoising future spatial-temporal latents and actions through a unified diffusion process. To bridge 2D visual tokens and 3D metric control, STARRY introduces Geometry-Aware Selective Attention Modulation (GASAM), which converts predicted depth and end-effector geometry into token-aligned weights for selective action-attention modulation. On RoboTwin 2.0, STARRY achieves 93.82% / 93.30% average success under Clean and Randomized settings across 50 bimanual tasks. Real-world experiments show that STARRY improves average success from 42.5% to 70.8% compared with $\pi_{0.5}$. These results demonstrate the effectiveness of action-centric spatial-temporal world modeling for spatially and temporally demanding robotic manipulation.

Yuxuan Tian, Yurun Jin, Bin Yu, Yukun Shi, Hao Wu, Chi Harold Liu, Kai Chen, Cong Huang• 2026

Related benchmarks

TaskDatasetResultRank
Robot ManipulationRoboTwin Clean 2.0
Average Success Rate93.82
39
Robot ManipulationRoboTwin Randomized 2.0--
33
Hand Over VegetablesHand Over Vegetables Real-world (Stage 1)
Success Rate85
2
Hand Over VegetablesHand Over Vegetables Real-world (Stage 2)
Success Rate70
2
Tidy Up RoomTidy Up Room Real-world (Stage 1)
Success Rate0.75
2
Tidy Up RoomTidy Up Room Real-world (Stage 2)
Success Rate65
2
Wash Baby BottleWash Baby Bottle Real-world (Stage 1)
Success Rate70
2
Wash Baby BottleWash Baby Bottle Real-world (Stage 2)
Success Rate60
2
Showing 8 of 8 rows

Other info

Follow for update