Bimanual Robot Manipulation via Multi-Agent In-Context Learning

About

Language Models (LLMs) have emerged as powerful reasoning engines for embodied control. In particular, In-Context Learning (ICL) enables off-the-shelf, text-only LLMs to predict robot actions without any task-specific training while preserving their generalization capabilities. Applying ICL to bimanual manipulation remains challenging as the high-dimensional joint action space and tight inter-arm coordination constraints rapidly overwhelm standard context windows. To address this, we introduce BiCICLe (Bimanual Coordinated In-Context Learning), the first framework that enables standard LLMs to perform few-shot bimanual manipulation without fine-tuning. BiCICLe frames bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves 70.5% average success rate, outperforming the best training-free baseline by 6.1 percentage points and surpassing most supervised methods. We also demonstrate superior real-world performance on 3 tasks without hardware-specific retraining.

Alessio Palma, Indro Spinelli, Vignesh Prasad, Luca Scofano, Yufeng Jin, Georgia Chalvatzaki, Fabio Galasso• 2026

Related benchmarks

Task	Dataset	Result
Bimanual Robot Manipulation	TWIN 1.0 (test)	Push Box Success Rate99	18
Bimanual Robot Manipulation	TWIN	Push Box Success Rate99	3
Bimanual Robot Manipulation	Novel Bimanual Tasks Generalization outside TWIN benchmark (test)	Close Jar Success Rate61	2

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord