Adaptive Ensemble Aggregation for Actor-Critics

About

Ensembles are ubiquitous in off-policy actor-critic learning, yet their efficacy depends critically on how they are aggregated. Current methods typically rely on static rules or task-specific hyperparameters to balance overestimation bias and variance, leaving the challenge of a truly adaptive approach open. We introduce Adaptive Ensemble Aggregation (AEA), an algorithm that dynamically constructs ensemble-based targets for both critic and actor updates directly from training dynamics. We prove that AEA converges to a unique equilibrium where the aggregation parameter minimizes value estimation error within a defined stability region. Theoretically, we establish that AEA achieves a shrinkage property where the estimation bias vanishes as the total ensemble size grows. Unlike subset-based methods like REDQ, which hit an information bottleneck determined by a fixed variance floor regardless of the ensemble size, AEA exploits the full ensemble to achieve optimal variance reduction-scaling inversely with the total number of models-and maximal Fisher information. Furthermore, we provide a formal guarantee for monotonic policy improvement under this adaptive regime. Extensive evaluations on various continuous control tasks demonstrate that AEA outperforms, on the majority of tasks, state-of-the-art baselines, providing a robust and self-calibrating framework for ensemble-based reinforcement learning.

Nicklas Werge, Yi-Shan Wu, Manuel Haussmann, Bahareh Tasdighi, Melih Kandemir• 2025

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo v5	Ant Score4.74e+3	15
Continuous Control	DeepMind Control Suite (DMC)	Cheetah Run827	15
Continuous Control	Mujoco	Ant-v54.74e+3	9
Continuous Control	DMC	Cheetah-run Score827	5

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord