Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Partner-Aware Hierarchical Skill Discovery for Robust Human-AI Collaboration

About

Multi-agent collaboration, especially in human-AI teaming, requires agents that can adapt to novel partners with diverse and dynamic behaviors. Conventional Deep Hierarchical Reinforcement Learning (DHRL) methods focus on agent-centric rewards and overlook partner behavior, leading to shortcut learning, where skills exploit spurious information instead of adapting to partners' dynamic behaviors. This limitation undermines agents' ability to adapt and coordinate effectively with novel partners. We introduce Partner-Aware Skill Discovery (PASD), a DHRL framework that learns skills conditioned on partner behavior. PASD introduces a contrastive intrinsic reward to capture patterns emerging from partner interactions, aligning skill representations across similar partners while maintaining discriminability across diverse strategies. By structuring the skill space based on partner interactions, this approach mitigates shortcut learning and promotes behavioral consistency, enabling robust and adaptive coordination. We extensively evaluate PASD in the Overcooked-AI benchmark with a diverse population of partners characterized by varying skill levels and play styles. We further evaluate the approach with human proxy models trained from human-human gameplay trajectories. PASD consistently outperforms existing population-based and hierarchical baselines, demonstrating transferable skill learning that generalizes across a wide range of partner behaviors. Analysis of learned skill representations shows that PASD adapts effectively to diverse partner behaviors, highlighting its robustness in human-AI collaboration.

Adnan Ahmad, Bahareh Nakisa, Mohammad Naim Rastgoo• 2026

Related benchmarks

TaskDatasetResultRank
Collaborative CookingOvercooked Asym. Adv.
Total Mean Reward112.5
4
Collaborative CookingOvercooked Coord. Ring
Mean Reward105.6
4
Collaborative CookingOvercooked Counter Circuit
Total Mean Reward44.38
4
Collaborative CookingOvercooked Forced Coord.
Total Mean Reward43.8
4
Cooperative CookingOvercooked-AI Cramped Room
Total Mean Reward150
4
Human-AI CollaborationOvercooked-AI (Evaluation partner population (novel AI behaviors))
Cramped Room Score165.8
4
Human-AI CoordinationOvercooked Cramped Room
Mean Reward118.2
4
Human-AI CoordinationOvercooked Asym. Adv.
Mean Reward198.2
4
Human-AI CoordinationOvercooked Coord. Ring
Mean Reward60
4
Human-AI CoordinationOvercooked Counter Circuit
Mean Reward62.5
4
Showing 10 of 11 rows

Other info

Follow for update