Federated Reinforcement Learning with Environment Heterogeneity
About
We study a Federated Reinforcement Learning (FedRL) problem in which $n$ agents collaboratively learn a single policy without sharing the trajectories they collected during agent-environment interaction. We stress the constraint of environment heterogeneity, which means $n$ environments corresponding to these $n$ agents have different state transitions. To obtain a value function or a policy function which optimizes the overall performance in all environments, we propose two federated RL algorithms, \texttt{QAvg} and \texttt{PAvg}. We theoretically prove that these algorithms converge to suboptimal solutions, while such suboptimality depends on how heterogeneous these $n$ environments are. Moreover, we propose a heuristic that achieves personalization by embedding the $n$ environments into $n$ vectors. The personalization heuristic not only improves the training but also allows for better generalization to new environments.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classic Discrete Control | MountainCar v0 | Mean Episodic Return167.8 | 18 | |
| Classic Discrete Control | CartPole v1 | Mean Episodic Return120.8 | 18 | |
| Continuous-state and discrete-action control | LunarLander v3 | Average Reward200.9 | 13 | |
| Continuous-state and discrete-action control | Acrobot v1 | Final Average Reward85.3 | 13 | |
| Reinforcement Learning | cartpole | Wall-clock Training Time (min)25.9 | 13 | |
| Reinforcement Learning | Acrobot | Training Time (min)22.8 | 13 | |
| Reinforcement Learning | LunarLander | Training Time (min)36.8 | 13 | |
| Reinforcement Learning | MountainCar | Training Time (min)85.6 | 13 |