FedQHD: Closed-Form Function-Space Federated Reinforcement Learning

About

Federated reinforcement learning enables decentralized agents to collaboratively improve policies or value estimates without exchanging raw trajectories. However, FedAvg-style parameter averaging is not function-space consistent: when clients use heterogeneous encoders or even identical nonlinear networks, averaged parameters need not correspond to the weighted average of client value functions in any common function space. We propose FedQHD, a federated Q-learning method using hyperdimensional (random-feature) state encoders with a linear readout, so that Q-functions are nonlinear in state yet linear in trainable parameters. This linear structure enables closed-form aggregation. With a shared encoder, the function-space consensus update coincides exactly with weighted averaging of local readout matrices. With heterogeneous encoders, the server constructs a global teacher by averaging client Q-values on a shared anchor-state set, and each client compiles this teacher into its local representation via a single ridge projection. We formalize the federation gap -- the error incurred when compiling a federated teacher into a heterogeneous client representation -- relative to a client-specific oracle projection. We show that this gap decomposes into subspace misalignment, anchor-set conditioning, and regularization bias. We further identify the anchor-to-dimension ratio $m \geq D_i$ as the well-conditioned regime in which the gap reduces to a multiple of the encoder heterogeneity floor. On four continuous-state, discrete-action control benchmarks, FedQHD matches or outperforms FedAvg-style baselines and distillation-based alternatives while requiring substantially less computation, and the empirical dependence of the federation gap on encoder dimension matches our theoretical analysis.

Yuchen Hou, Yongshan Chen, Zhuowen Zou, Calvin Yeung, Mohsen Imani, Tian Lan, Mahdi Imani• 2026

Related benchmarks

Task	Dataset	Result
Classic Discrete Control	CartPole v1	Mean Episodic Return466.3	18
Classic Discrete Control	MountainCar v0	Mean Episodic Return162.3	18
Reinforcement Learning	Acrobot	Training Time (min)1.9	13
Continuous-state and discrete-action control	LunarLander v3	Average Reward224.1	13
Reinforcement Learning	MountainCar	Training Time (min)1.4	13
Reinforcement Learning	cartpole	Wall-clock Training Time (min)1.6	13
Reinforcement Learning	LunarLander	Training Time (min)5.9	13
Continuous-state and discrete-action control	Acrobot v1	Final Average Reward105	13

Showing 8 of 8 rows

Other info

Follow for update

@wizwand_team Discord