Structural Action Transformer for 3D Dexterous Manipulation
About
Achieving human-level dexterity in robots via imitation learning from heterogeneous datasets is hindered by the challenge of cross-embodiment skill transfer, particularly for high-DoF robotic hands. Existing methods, often relying on 2D observations and temporal-centric action representation, struggle to capture 3D spatial relations and fail to handle embodiment heterogeneity. This paper proposes the Structural Action Transformer (SAT), a new 3D dexterous manipulation policy that challenges this paradigm by introducing a structural-centric perspective. We reframe each action chunk not as a temporal sequence, but as a variable-length, unordered sequence of joint-wise trajectories. This structural formulation allows a Transformer to natively handle heterogeneous embodiments, treating the joint count as a variable sequence length. To encode structural priors and resolve ambiguity, we introduce an Embodied Joint Codebook that embeds each joint's functional role and kinematic properties. Our model learns to generate these trajectories from 3D point clouds via a continuous-time flow matching objective. We validate our approach by pre-training on large-scale heterogeneous datasets and fine-tuning on simulation and real-world dexterous manipulation tasks. Our method consistently outperforms all baselines, demonstrating superior sample efficiency and effective cross-embodiment skill transfer. This structural-centric representation offers a new path toward scaling policies for high-DoF, heterogeneous manipulators.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Dexterous Hand Control | Adroit | Overall Avg Success Rate75 | 19 | |
| Dexterous Hand Manipulation | DexArt | Success Rate73 | 12 | |
| Dexterous Manipulation | Bi-DexHands | Success Rate67 | 6 | |
| Dexterous Manipulation | Adroit, DexArt, and Bi-DexHands | Average Success71 | 6 | |
| Bimanual Manipulation | Real-world bimanual manipulation Remove the pen cap | Success Rate30 | 3 | |
| Bimanual Manipulation | Real-world bimanual manipulation Hand over Baymax | Success Rate85 | 3 | |
| Bimanual Manipulation | Real-world bimanual manipulation Push then grab box | Success Rate35 | 3 | |
| Bimanual Manipulation | Real-world bimanual manipulation Place block in plate | Success Rate90 | 3 | |
| Bimanual Manipulation | Real-world bimanual manipulation Brush the cup | Success Rate45 | 3 | |
| Bimanual Manipulation | Real-world bimanual manipulation Grasp basketball | Success Rate95 | 3 |