CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning

About

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision-making on neuromorphic hardware by mimicking the event-driven dynamics of biological neurons. However, the discrete and non-differentiable nature of spikes leads to unstable gradient propagation in directly trained SNNs, making Batch Normalization (BN) an important component for stabilizing training. In online Reinforcement Learning (RL), imprecise BN statistics hinder exploitation, resulting in slower convergence and suboptimal policies. While Artificial Neural Networks (ANNs) can often omit BN, SNNs critically depend on it, limiting the adoption of SNNs for energy-efficient control on resource-constrained devices. To overcome this, we propose Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN), which introduces (i) a confidence-guided adaptive update strategy for BN statistics and (ii) a re-calibration mechanism to align distributions. By providing more accurate normalization, CaRe-BN stabilizes SNN optimization without disrupting the RL training process. Importantly, CaRe-BN does not alter inference, thus preserving the energy efficiency of SNNs in deployment. Extensive experiments on both discrete and continuous control benchmarks demonstrate that CaRe-BN improves SNN performance by up to $22.6\%$ across different spiking neuron models and RL algorithms. Remarkably, SNNs equipped with CaRe-BN even surpass their ANN counterparts by $5.9\%$. These results highlight a new direction for BN techniques tailored to RL, paving the way for neuromorphic agents that are both efficient and high-performing. Code is available at https://github.com/xuzijie32/CaRe-BN.

Zijie Xu, Xinyu Shi, Yiting Dong, Zihan Huang, Zhaofei Yu• 2025

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Walker2d v4	Normalized Performance85.92	51
Continuous Control	MuJoCo Ant v4	Average Return5.37e+3	46
Continuous Control	MuJoCo Hopper v4	Normalized Performance3.59e+3	28
Reinforcement Learning	MuJoCo v4 (test)	Avg Return (Ant-v4)5.08e+3	11
Continuous Control	MuJoCo Suite Aggregate	Average Performance Gain (APG)5.9	10
Continuous Control	MuJoCo IDP v4	Max Average Return9.35e+3	10

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord