Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

About

Federated reinforcement learning enables decentralized agents to collaboratively improve policies or value estimates without exchanging raw trajectories. However, FedAvg-style parameter averaging is not function-space consistent: when clients use heterogeneous encoders or even identical nonlinear networks, averaged parameters need not correspond to the weighted average of client value functions in any common function space. We propose FedQHD, a federated Q-learning method using hyperdimensional (random-feature) state encoders with a linear readout, so that Q-functions are nonlinear in state yet linear in trainable parameters. This linear structure enables closed-form aggregation. With a shared encoder, the function-space consensus update coincides exactly with weighted averaging of local readout matrices. With heterogeneous encoders, the server constructs a global teacher by averaging client Q-values on a shared anchor-state set, and each client compiles this teacher into its local representation via a single ridge projection. We formalize the federation gap -- the error incurred when compiling a federated teacher into a heterogeneous client representation -- relative to a client-specific oracle projection. We show that this gap decomposes into subspace misalignment, anchor-set conditioning, and regularization bias. We further identify the anchor-to-dimension ratio $m \geq D_i$ as the well-conditioned regime in which the gap reduces to a multiple of the encoder heterogeneity floor. On four continuous-state, discrete-action control benchmarks, FedQHD matches or outperforms FedAvg-style baselines and distillation-based alternatives while requiring substantially less computation, and the empirical dependence of the federation gap on encoder dimension matches our theoretical analysis.

Yuchen Hou, Yongshan Chen, Zhuowen Zou, Calvin Yeung, Mohsen Imani, Tian Lan, Mahdi Imani• 2026

Related benchmarks

TaskDatasetResultRank
Classic Discrete ControlCartPole v1
Mean Episodic Return466.3
18
Classic Discrete ControlMountainCar v0
Mean Episodic Return162.3
18
Reinforcement LearningAcrobot
Training Time (min)1.9
13
Continuous-state and discrete-action controlLunarLander v3
Average Reward224.1
13
Reinforcement LearningMountainCar
Training Time (min)1.4
13
Reinforcement Learningcartpole
Wall-clock Training Time (min)1.6
13
Reinforcement LearningLunarLander
Training Time (min)5.9
13
Continuous-state and discrete-action controlAcrobot v1
Final Average Reward105
13
Showing 8 of 8 rows

Other info

Follow for update