Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Structural Action Transformer for 3D Dexterous Manipulation

About

Achieving human-level dexterity in robots via imitation learning from heterogeneous datasets is hindered by the challenge of cross-embodiment skill transfer, particularly for high-DoF robotic hands. Existing methods, often relying on 2D observations and temporal-centric action representation, struggle to capture 3D spatial relations and fail to handle embodiment heterogeneity. This paper proposes the Structural Action Transformer (SAT), a new 3D dexterous manipulation policy that challenges this paradigm by introducing a structural-centric perspective. We reframe each action chunk not as a temporal sequence, but as a variable-length, unordered sequence of joint-wise trajectories. This structural formulation allows a Transformer to natively handle heterogeneous embodiments, treating the joint count as a variable sequence length. To encode structural priors and resolve ambiguity, we introduce an Embodied Joint Codebook that embeds each joint's functional role and kinematic properties. Our model learns to generate these trajectories from 3D point clouds via a continuous-time flow matching objective. We validate our approach by pre-training on large-scale heterogeneous datasets and fine-tuning on simulation and real-world dexterous manipulation tasks. Our method consistently outperforms all baselines, demonstrating superior sample efficiency and effective cross-embodiment skill transfer. This structural-centric representation offers a new path toward scaling policies for high-DoF, heterogeneous manipulators.

Xiaohan Lei, Min Wang, Bohong Weng, Wengang Zhou, Houqiang Li• 2026

Related benchmarks

TaskDatasetResultRank
Dexterous Hand ControlAdroit
Overall Avg Success Rate75
19
Dexterous Hand ManipulationDexArt
Success Rate73
12
Dexterous ManipulationBi-DexHands
Success Rate67
6
Dexterous ManipulationAdroit, DexArt, and Bi-DexHands
Average Success71
6
Bimanual ManipulationReal-world bimanual manipulation Remove the pen cap
Success Rate30
3
Bimanual ManipulationReal-world bimanual manipulation Hand over Baymax
Success Rate85
3
Bimanual ManipulationReal-world bimanual manipulation Push then grab box
Success Rate35
3
Bimanual ManipulationReal-world bimanual manipulation Place block in plate
Success Rate90
3
Bimanual ManipulationReal-world bimanual manipulation Brush the cup
Success Rate45
3
Bimanual ManipulationReal-world bimanual manipulation Grasp basketball
Success Rate95
3
Showing 10 of 10 rows

Other info

Follow for update