MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
About
This paper introduces MobileH2R, a framework for learning generalizable vision-based human-to-mobile-robot (H2MR) handover skills. Unlike traditional fixed-base handovers, this task requires a mobile robot to reliably receive objects in a large workspace enabled by its mobility. Our key insight is that generalizable handover skills can be developed in simulators using high-quality synthetic data, without the need for real-world demonstrations. To achieve this, we propose a scalable pipeline for generating diverse synthetic full-body human motion data, an automated method for creating safe and imitation-friendly demonstrations, and an efficient 4D imitation learning method for distilling large-scale demonstrations into closed-loop policies with base-arm coordination. Experimental evaluations in both simulators and the real world show significant improvements (at least +15% success rate) over baseline methods in all cases. Experiments also validate that large-scale and diverse synthetic data greatly enhances robot learning, highlighting our scalable framework.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human-to-Robot Handover | m0 synthetic (test) | Success Rate63.8 | 4 | |
| Human-to-Robot Handover | n0 synthetic (test) | Success Rate53.4 | 4 | |
| Human-to-Robot Handover | DexYCB s0 mocap-based (test) | Success Rate77.78 | 4 | |
| Human-to-mobile-robot handover | Real-world m0 simple setting (test) | Success Rate80 | 2 | |
| Human-to-mobile-robot handover | Real-world n0 complex setting (test) | Success Rate63.3 | 2 |