Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

HALO: Learning Human-Robot Collaboration via Heterogeneous-Agent Lyapunov Policy Optimization

About

To improve generalization and resilience in human-robot collaboration (HRC), robots must contend with diverse combinations of human behaviors and contexts, motivating multi-agent reinforcement learning (MARL). However, inherent heterogeneity between robots and humans creates a rationality gap (RG), where decentralized policy updates deviate from cooperative joint optimization. The resulting learning problem is a general-sum differentiable game, so independent policy-gradient updates can oscillate or diverge without added structure. We propose heterogeneous-agent Lyapunov policy optimization (HALO), a framework that stabilizes decentralized MARL by enforcing Lyapunov-based contraction in policy-parameter space. Unlike Lyapunov-based safe RL, which targets state/trajectory constraints in constrained Markov decision processes, HALO uses Lyapunov certification to stabilize decentralized policy learning. HALO rectifies decentralized gradients via optimal quadratic projections, ensuring monotonic contraction of RG and enabling effective exploration of open-ended interaction spaces. Extensive simulations and real-world humanoid-robot experiments show that this certified stability improves generalization and robustness in collaborative corner cases. Our project website is available at https://HaoZhang-THU.github.io/HALO/.

Hao Zhang, Yaru Niu, Yikai Wang, Ding Zhao, H. Eric Tseng• 2026

Related benchmarks

TaskDatasetResultRank
Heterogeneous CoordinationOSP
Success Rate92.8
16
Heterogeneous CoordinationSCT
Success Rate91.1
16
Heterogeneous CoordinationSLH
Success Rate88.2
16
Multi-agent optimization analysisOSP, SCT, and SLH Global
Overall SR86
4
Orientation-sensitive pushingOSP Real-world deployment 5 trials
Time to destination (s)61.7
3
Spatially-confined transportSCT 5 trials (Real-world deployment)
Time to Destination (s)76.2
3
Stability Under HaltingSLH Real-world deployment 5 trials
Object Drop Rate (%)20
3
Showing 7 of 7 rows

Other info

Follow for update