Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Learning to Manipulate Anything: Revealing Data Scaling Laws in Bounding-Box Guided Policies

About

Diffusion-based policies show limited generalization in semantic manipulation, posing a key obstacle to the deployment of real-world robots. This limitation arises because relying solely on text instructions is inadequate to direct the policy's attention toward the target object in complex and dynamic environments. To solve this problem, we propose leveraging bounding-box instruction to directly specify target object, and further investigate whether data scaling laws exist in semantic manipulation tasks. Specifically, we design a handheld segmentation device with an automated annotation pipeline, Label-UMI, which enables the efficient collection of demonstration data with semantic labels. We further propose a semantic-motion-decoupled framework that integrates object detection and bounding-box guided diffusion policy to improve generalization and adaptability in semantic manipulation. Throughout extensive real-world experiments on large-scale datasets, we validate the effectiveness of the approach, and reveal a power-law relationship between generalization performance and the number of bounding-box objects. Finally, we summarize an effective data collection strategy for semantic manipulation, which can achieve 85\% success rates across four tasks on both seen and unseen objects. All datasets and code will be released to the community.

Yihao Wu, Jinming Ma, Junbo Tan, Yanzhao Yu, Shoujie Li, Mingliang Zhou, Diyun Xiang, Xueqian Wang• 2026

Related benchmarks

TaskDatasetResultRank
Button PressingButton Pressing Similar Textures
Success Rate93
6
Button PressingButton Pressing Similar Shapes
Success Rate95
6
Drink FetchingDrink Fetching Similar Textures
Success Rate89
6
Drink FetchingDrink Fetching Similar Shapes
Success Rate92
6
Rubbish DisposalRubbish Disposal Similar Textures
Success Rate90
6
Rubbish DisposalRubbish Disposal Similar Shapes
Success Rate91
6
Water PouringWater Pouring Similar Textures
Success Rate91
6
Water PouringWater Pouring Similar Shapes
Success Rate88
6
Button PressingButton Pressing Real-world (test)
Score91.21
1
Drink FetchingDrink Fetching Real-world (test)
Score86.45
1
Showing 10 of 12 rows

Other info

Follow for update