Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FedHPD: Heterogeneous Federated Reinforcement Learning via Policy Distillation

About

Federated Reinforcement Learning (FedRL) improves sample efficiency while preserving privacy; however, most existing studies assume homogeneous agents, limiting its applicability in real-world scenarios. This paper investigates FedRL in black-box settings with heterogeneous agents, where each agent employs distinct policy networks and training configurations without disclosing their internal details. Knowledge Distillation (KD) is a promising method for facilitating knowledge sharing among heterogeneous models, but it faces challenges related to the scarcity of public datasets and limitations in knowledge representation when applied to FedRL. To address these challenges, we propose Federated Heterogeneous Policy Distillation (FedHPD), which solves the problem of heterogeneous FedRL by utilizing action probability distributions as a medium for knowledge sharing. We provide a theoretical analysis of FedHPD's convergence under standard assumptions. Extensive experiments corroborate that FedHPD shows significant improvements across various reinforcement learning benchmark tasks, further validating our theoretical findings. Moreover, additional experiments demonstrate that FedHPD operates effectively without the need for an elaborate selection of public datasets.

Wenzheng Jiang, Ji Wang, Xiongtao Zhang, Weidong Bao, Cheston Tan, Flint Xiaofeng Fan• 2025

Related benchmarks

TaskDatasetResultRank
Classic Discrete ControlMountainCar v0
Mean Episodic Return174.3
18
Classic Discrete ControlCartPole v1
Mean Episodic Return169.9
18
Continuous-state and discrete-action controlLunarLander v3
Average Reward131.4
13
Continuous-state and discrete-action controlAcrobot v1
Final Average Reward89.2
13
Reinforcement LearningMountainCar
Training Time (min)28.1
13
Reinforcement Learningcartpole
Wall-clock Training Time (min)10.3
13
Reinforcement LearningAcrobot
Training Time (min)19.7
13
Reinforcement LearningLunarLander
Training Time (min)31.3
13
Showing 8 of 8 rows

Other info

Follow for update