Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration

About

Multi-agent collaboration, especially in human-AI teaming, requires agents that can adapt to novel partners with diverse and dynamic behaviors. Conventional Deep Hierarchical Reinforcement Learning (DHRL) methods focus on agent-centric rewards and overlook partner behavior, leading to shortcut learning, where skills exploit spurious information instead of adapting to partners' dynamic behaviors. This limitation undermines agents' ability to adapt and coordinate effectively with novel partners. We introduce Partner-Aware Skill Discovery (PASD), a DHRL framework that learns skills conditioned on partner behavior. PASD introduces a contrastive intrinsic reward to capture patterns emerging from partner interactions, aligning skill representations across similar partners while maintaining discriminability across diverse strategies. By structuring the skill space based on partner interactions, this approach mitigates shortcut learning and promotes behavioral consistency, enabling robust and adaptive coordination. We extensively evaluate PASD in the Overcooked-AI benchmark with a diverse population of partners characterized by varying skill levels and play styles. We further evaluate the approach with human proxy models trained from human-human gameplay trajectories. PASD consistently outperforms existing population-based and hierarchical baselines, demonstrating transferable skill learning that generalizes across a wide range of partner behaviors. Analysis of learned skill representations shows that PASD adapts effectively to diverse partner behaviors, highlighting its robustness in human-AI collaboration.

Adnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo• 2026

Related benchmarks

Task	Dataset	Result
Collaborative Cooking	Overcooked Asym. Adv.	Total Mean Reward112.5	4
Collaborative Cooking	Overcooked Coord. Ring	Mean Reward105.6	4
Collaborative Cooking	Overcooked Counter Circuit	Total Mean Reward44.38	4
Collaborative Cooking	Overcooked Forced Coord.	Total Mean Reward43.8	4
Cooperative Cooking	Overcooked-AI Cramped Room	Total Mean Reward150	4
Human-AI Collaboration	Overcooked-AI (Evaluation partner population (novel AI behaviors))	Cramped Room Score165.8	4
Human-AI Coordination	Overcooked Cramped Room	Mean Reward118.2	4
Human-AI Coordination	Overcooked Asym. Adv.	Mean Reward198.2	4
Human-AI Coordination	Overcooked Coord. Ring	Mean Reward60	4
Human-AI Coordination	Overcooked Counter Circuit	Mean Reward62.5	4

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord