Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Move-Then-Operate: Behavioral Phasing for Human-Like Robotic Manipulation

About

We present Move-Then-Operate, a Vision language action framework that explicitly decouples robotic manipulation into two distinct behavioral phases: coarse relocation (move) and contact-critical interaction (operate). Unlike monolithic policies that conflate these heterogeneous regimes, our architecture employs a dual-expert policy routed by a learnable phase selector, introducing a structural inductive bias that isolates phase-specific dynamics. Phase labels are automatically generated via an MLLM-based pipeline conditioned on lightweight contextual cues such as end-effector velocity and subtask decomposition to ensure alignment with human motor patterns. Evaluated on the RoboTwin2 benchmark, our method achieves an average success rate of $68.9\%$, outperforming the monolithic $\pi_0$ baseline by $24\%$. It matches or exceeds models trained on $10\times$ more data and reaches peak performance in $40\%$ fewer training steps, demonstrating that architectural disentanglement of move and operate phases is a highly effective and efficient strategy for mastering high-precision manipulation.

Haoming Xu, Lei Lei, Jie Gu, Chu Tang, Jingmin Chen, Ruiqi Wang• 2026

Related benchmarks

TaskDatasetResultRank
Robotic ManipulationRoboTwin2--
10
Place empty cupRoboTwin 2.0 (test)
Success Rate29
8
Click BellRoboTwin2 (test)
Success Rate99
3
Place Bread BasketRoboTwin2 (test)
Success Rate49
3
Press StaplerRoboTwin2 (test)
Success Rate93
3
Move Pillbottle padRoboTwin2 (test)
Success Rate16
3
Click AlarmclockRoboTwin2 (test)
Success Rate91
3
Place Burger FriesRoboTwin2 (test)
Success Rate61
3
Place Cans PlasticboxRoboTwin2 (test)
Success Rate16
3
Showing 9 of 9 rows

Other info

Follow for update