Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CaRe-BN: Precise Moving Statistics for Stabilizing Spiking Neural Networks in Reinforcement Learning

About

Spiking Neural Networks (SNNs) offer low-latency and energy-efficient decision-making on neuromorphic hardware by mimicking the event-driven dynamics of biological neurons. However, the discrete and non-differentiable nature of spikes leads to unstable gradient propagation in directly trained SNNs, making Batch Normalization (BN) an important component for stabilizing training. In online Reinforcement Learning (RL), imprecise BN statistics hinder exploitation, resulting in slower convergence and suboptimal policies. While Artificial Neural Networks (ANNs) can often omit BN, SNNs critically depend on it, limiting the adoption of SNNs for energy-efficient control on resource-constrained devices. To overcome this, we propose Confidence-adaptive and Re-calibration Batch Normalization (CaRe-BN), which introduces (i) a confidence-guided adaptive update strategy for BN statistics and (ii) a re-calibration mechanism to align distributions. By providing more accurate normalization, CaRe-BN stabilizes SNN optimization without disrupting the RL training process. Importantly, CaRe-BN does not alter inference, thus preserving the energy efficiency of SNNs in deployment. Extensive experiments on both discrete and continuous control benchmarks demonstrate that CaRe-BN improves SNN performance by up to $22.6\%$ across different spiking neuron models and RL algorithms. Remarkably, SNNs equipped with CaRe-BN even surpass their ANN counterparts by $5.9\%$. These results highlight a new direction for BN techniques tailored to RL, paving the way for neuromorphic agents that are both efficient and high-performing. Code is available at https://github.com/xuzijie32/CaRe-BN.

Zijie Xu, Xinyu Shi, Yiting Dong, Zihan Huang, Zhaofei Yu• 2025

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Walker2d v4
Normalized Performance85.92
34
Continuous ControlMuJoCo Hopper v4
Normalized Performance3.59e+3
28
Continuous ControlMuJoCo Ant v4--
24
Reinforcement LearningMuJoCo v4 (test)
Avg Return (Ant-v4)5.08e+3
11
Continuous ControlMuJoCo Suite Aggregate
Average Performance Gain (APG)5.9
10
Continuous ControlMuJoCo IDP v4
Max Average Return9.35e+3
10
Showing 6 of 6 rows

Other info

Follow for update