Adaptive Human-AI Coordination via Hierarchical Action Disentanglement
About
Human-AI collaboration requires agents that can adapt to diverse partner behaviors and skill levels while remaining robust to unseen partners. Existing methods often collapse to a single dominant behavior or learn poorly aligned skills, limiting effective coordination. We propose Intrinsic Action Disentanglement (IAD), a deep hierarchical reinforcement learning (DHRL) framework that learns distinct, partner-aware low-level action sequences conditioned on high-level latent skills. IAD introduces an intrinsic reward that explicitly encourages disentangled action distributions of the agent's low-level policy across skills, yielding an interpretable mapping between high-level decisions and partner-specific behavioral responses. By capturing temporally extended interaction patterns, IAD enables flexible adaptation to heterogeneous partner dynamics under distributional shift. We evaluate IAD in the Overcooked-AI domain across multiple layouts and diverse partner settings, including unseen simulated partners, a human-proxy model trained on human-human gameplay, and real human partners. Results show that IAD consistently outperforms strong baselines and achieves more reliable, adaptive coordination across all settings.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Cooperative Multi-Agent Coordination | Overcooked-AI Cramped Room | Total Mean Reward173 | 4 | |
| Cooperative Multi-Agent Coordination | Overcooked-AI Asymmetric Advantages | Mean Reward140.6 | 4 | |
| Cooperative Multi-Agent Coordination | Overcooked-AI Coordination Ring | Total Mean Reward115.4 | 4 | |
| Cooperative Multi-Agent Coordination | Overcooked-AI Counter Circuit | Total Mean Reward60.42 | 4 | |
| Cooperative Multi-Agent Coordination | Overcooked-AI Forced Coordination | Total Mean Reward44.25 | 4 | |
| Cooperative Multi-Agent Reinforcement Learning | Overcooked-AI Cramped Room (test) | Mean Reward148.8 | 4 | |
| Cooperative Multi-Agent Reinforcement Learning | Overcooked-AI Asymmetric Advantages (test) | Mean Reward124.4 | 4 | |
| Cooperative Multi-Agent Reinforcement Learning | Overcooked-AI Coordination Ring (test) | Mean Reward97.35 | 4 | |
| Cooperative Multi-Agent Reinforcement Learning | Overcooked-AI Counter Circuit (test) | Mean Reward43.65 | 4 | |
| Cooperative Multi-Agent Reinforcement Learning | Overcooked-AI Forced Coordination (test) | Mean Reward35.43 | 4 |