H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

About

Imitation learning for robotic manipulation faces a fundamental challenge: the scarcity of large-scale, high-quality robot demonstration data. Recent robotic foundation models often pre-train on cross-embodiment robot datasets to increase data scale, while they face significant limitations as the diverse morphologies and action spaces across different robot embodiments make unified training challenging. In this paper, we present H-RDT (Human to Robotics Diffusion Transformer), a novel approach that leverages human manipulation data to enhance robot manipulation capabilities. Our key insight is that large-scale egocentric human manipulation videos with paired 3D hand pose annotations provide rich behavioral priors that capture natural manipulation strategies and can benefit robotic policy learning. We introduce a two-stage training paradigm: (1) pre-training on large-scale egocentric human manipulation data, and (2) cross-embodiment fine-tuning on robot-specific data with modular action encoders and decoders. Built on a diffusion transformer architecture with 2B parameters, H-RDT uses flow matching to model complex action distributions. Extensive evaluations encompassing both simulation and real-world experiments, single-task and multitask scenarios, as well as few-shot learning and robustness assessments, demonstrate that H-RDT outperforms training from scratch and existing state-of-the-art methods, including Pi0 and RDT, achieving significant improvements of 13.9% and 40.5% over training from scratch in simulation and real-world experiments, respectively. The results validate our core hypothesis that human manipulation data can serve as a powerful foundation for learning bimanual robotic manipulation policies.

Hongzhe Bi, Lingxuan Wu, Tianwei Lin, Hengkai Tan, Zhizhong Su, Hang Su, Jun Zhu• 2025

Related benchmarks

Task	Dataset	Result
Task 7: Hold the lunch bag and squat down to place on the table	Real-world	Hold Success Rate90	8
Task 4: Grab the can, turn and pour onto plate, push the cart forward	Real-world	Grasp Success30	8
Task 1: Remove the lid, turn on the faucet, and fill with water	Real-world	Grasp Success Rate70	8
Task 3: Pick the bottle, turn around, and pour into cup	Real-world	Grasp Success Rate1	8
Task 8: Pull out the tray and turn to throw the chip can into the trash	Real-world	Grasp Success Rate80	8
Task 5: Put toy into basket, walk to human, hand it over	Real-world	Grasp Success Rate20	8
Task 2: Spray the bowl with water, wipe clean, and fold it up	Real-world	Grasp Success Rate0.00e+0	8
Task 6: Push the cart, grab the grapes, and place on the plate	Real-world	Handle Success Rate0.00e+0	8
Robot Manipulation	RoboTwin	Success Rate (Click Bell)88	6
Bimanual Robotic Manipulation	AV-ALOHA OOD-Distractors	Cube Transfer28	5

Showing 10 of 23 rows

Other info

Follow for update

@wizwand_team Discord