On the Stability of Growth in Structural Plasticity

About

Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.

Lute Lillo, Nick Cheney• 2026

Related benchmarks

Task	Dataset	Result
Continual Learning	CIFAR100 Split	Average Per-Task Accuracy22.1	117
Continual Supervised Learning	CIFAR 5+1	Total Average Online Task Accuracy38.3	49
Continual Supervised Learning	Continual ImageNet	Total Average Online Task Accuracy72.2	49
Continual Supervised Learning	CIFAR Random Label	Total Average Online Task Accuracy19.5	49
Continual Learning	Permuted MNIST	Average Accuracy76.1	32
Continual Learning	MNIST Random-Label	Average Accuracy23.7	32

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord