SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning

About

In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies that alleviates the loss of spectral plasticity. On MetaWorld and HumanoidBench, SPHERE improves average success under continual RL by 133% and 50% over an unregularized MoE baseline, while maintaining higher spectral plasticity throughout training.

Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li• 2026

Related benchmarks

Task	Dataset	Result
Continual Reinforcement Learning	HumanoidBench CRL	Mean Score53	7
Continual Learning	SplitCIFAR-100	Average Success Rate48	2
Continual Learning	20 Newsgroups (test)	Avg Success Rate27	2
Continual Reinforcement Learning	MetaWorld CRL	Mean0.41	2
Continual Reinforcement Learning	HumanoidBench First Five Tasks Stand Walk Pole Slide Run (final checkpoint after five-task sequence)	Average Success Rate25	2

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord