Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

About

Recent progress in vision-language-action (VLA) models has demonstrated strong potential for dual-arm manipulation, enabling complex behaviors and generalization to unseen environments. However, mainstream bimanual VLA formulations largely overlook the critical challenge of combinatorial diversity. Different pairings of single-arm behaviors can induce qualitatively distinct task behaviors, yet existing models do not explicitly account for this structure. We argue that effective bimanual VLAs should support skill reuse - the ability to recombine previously learned single-arm skills across novel left-right pairings - thereby avoiding the need to separately learn every possible combination. Current VLA designs entangle skills across arms, preventing such recomposition and limiting scalability. To address this limitation, we propose SkillVLA, a framework explicitly designed to enable skill reuse in dual-arm manipulation. Extensive experiments demonstrate that SkillVLA substantially improves skill composition, increasing overall success rate from 0% to 51%, and achieves strong performance on cooperative and long-horizon tasks.

Xuanran Zhai, Zekai Huang, Longyan Wu, Qianyou Zhao, Qiaojun Yu, Jieji Ren, Ce Hao, Harold Soh• 2026

Related benchmarks

TaskDatasetResultRank
Bimanual ManipulationSkill Recomposition Tasks Unseen Combinations
Success Rate (Cup x Cake)70
4
Cooperative Bimanual ManipulationCooperative Bimanual Tasks
Shake Success Rate25
4
Robot ManipulationSkills Learned
Success Rate (Cup)80
4
Showing 3 of 3 rows

Other info

Follow for update