SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

About

Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill reuse in dual-arm manipulation. Extensive experiments demonstrate that SkillVLA substantially improves skill composition, increasing overall success rate from 0% to 51%, and achieves strong performance on cooperative and long-horizon tasks.

Xuanran Zhai, Zekai Huang, Longyan Wu, Qianyou Zhao, Qiaojun Yu, Jieji Ren, Ce Hao, Harold Soh• 2026

Related benchmarks

Task	Dataset	Result
Bimanual Manipulation	Skill Recomposition Tasks Unseen Combinations	Success Rate (Cup x Cake)70	4
Cooperative Bimanual Manipulation	Cooperative Bimanual Tasks	Shake Success Rate25	4
Robot Manipulation	Skills Learned	Success Rate (Cup)80	4

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord