BridgeACT: Bridging Human Demonstrations to Robot Actions via Unified Tool-Target Affordances

About

Learning robot manipulation from human videos is appealing due to the scale and diversity of human demonstrations, but transferring such demonstrations to executable robot behavior remains challenging. Prior work either relies on robot data for downstream adaptation or learns affordance representations that remain at the perception level and do not directly support real-world execution. We present BridgeACT, an affordance-driven framework that learns robotic manipulation directly from human videos without requiring any robot demonstration data. Our key idea is to model affordance as an embodiment-agnostic intermediate representation that bridges human demonstrations and robot actions. BridgeACT decomposes manipulation into two complementary problems: where to grasp and how to move. To this end, BridgeACT first grounds task-relevant affordance regions in the current scene, and then predicts task-conditioned 3D motion affordances from human demonstrations. The resulting affordances are mapped to robot actions through a grasping module and a lightweight closed-loop motion controller, enabling direct deployment on real robots. In addition, we represent complex manipulation tasks as compositions of affordance operations, which allows a unified treatment of diverse tasks and object-to-object interactions. Experiments on real-world manipulation tasks show that BridgeACT outperforms prior baselines and generalizes to unseen objects, scenes, and viewpoints.

Yifan Han, Jianxiang Liu, Haoyu Zhang, Yuqi Gu, Yunhan Guo, Wenzhao Lian• 2026

Related benchmarks

Task	Dataset	Result
Cut	Cross-Object	Success Rate30	4
Cut	Cross-Scene	Success Rate2	4
Cut	Real-world Motion Affordance	Success Rate4	4
Pick	Cross-Scene	Success Rate100	4
Place	Real-world Motion Affordance	Success Rate8	4
Pour	Real-world Motion Affordance	Success Rate40	4
Pick	Cross-Object	Success Rate100	4
Pickup	Real-world Motion Affordance	Success Rate10	4
Close	Real-world Motion Affordance	Success Rate9	3
Open	Cross-Object	Success Rate7	3

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord