Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Learning Generalizable Skills from Offline Multi-Task Data for Multi-Agent Cooperation

About

Learning cooperative multi-agent policy from offline multi-task data that can generalize to unseen tasks with varying numbers of agents and targets is an attractive problem in many scenarios. Although aggregating general behavior patterns among multiple tasks as skills to improve policy transfer is a promising approach, two primary challenges hinder the further advancement of skill learning in offline multi-task MARL. Firstly, extracting general cooperative behaviors from various action sequences as common skills lacks bringing cooperative temporal knowledge into them. Secondly, existing works only involve common skills and can not adaptively choose independent knowledge as task-specific skills in each task for fine-grained action execution. To tackle these challenges, we propose Hierarchical and Separate Skill Discovery (HiSSD), a novel approach for generalizable offline multi-task MARL through skill learning. HiSSD leverages a hierarchical framework that jointly learns common and task-specific skills. The common skills learn cooperative temporal knowledge and enable in-sample exploitation for offline multi-task MARL. The task-specific skills represent the priors of each task and achieve a task-guided fine-grained action execution. To verify the advancement of our method, we conduct experiments on multi-agent MuJoCo and SMAC benchmarks. After training the policy using HiSSD on offline multi-task data, the empirical results show that HiSSD assigns effective cooperative behaviors and obtains superior performance in unseen tasks.

Sicong Liu, Yang Shu, Chenjuan Guo, Bin Yang• 2025

Related benchmarks

TaskDatasetResultRank
Multi-Agent Reinforcement LearningSMAC v2 (test)
Win Rate (Protoss 5 Units)28.5
24
Multi-agent reinforcement learning on Unseen TasksSMAC Stalker-Zealot Medium quality
1s3z Performance65.6
12
Offline Multi-Agent Reinforcement LearningSMAC Expert Marine-Hard
Performance at 3m99.4
8
Multi-Agent Reinforcement LearningSMAC Stalker-Zealot Unseen (test)
Mean Win Rate88.8
8
Cooperative Navigationmulti-agent particle environment (expert)
CN-2 Result100
4
Multi-Agent Reinforcement LearningSMAC Medium-replay v2
Score (Terran, 3 units)31.3
4
Offline Multi-Agent Reinforcement LearningMarine-Easy Expert
Score (3m)100
4
Offline Multi-Agent Reinforcement LearningMarine-Easy Medium-Replay
Performance (3m)87.5
4
Cooperative NavigationMPE Cooperative Navigation Medium (Source and Unseen Tasks)
CN-2 Score38.8
4
Multi-Agent Reinforcement LearningSMAC v2 (seen)
Terran Win Rate24.9
4
Showing 10 of 25 rows

Other info

Follow for update