On the Stability of Growth in Structural Plasticity
About
Standard deep-learning pipelines usually choose the network architecture before training and keep it fixed throughout optimization. In contrast, a model can also be adapted by editing its structure during training, for example by pruning existing hidden-neuron units or growing new ones. Although growth is appealing for adaptive and continual systems, we show that it is not simply the inverse of pruning. Pruning selects among units that have participated in training from the start, whereas growth inserts new units into an already specialized optimization trajectory. We isolate this insertion problem and show that newborn units are often forward-active but backward-starved: they participate in the forward computation, yet receive much weaker gradient signal than incumbent units. This disadvantage is minor in small MLP benchmarks, but becomes clear in harder image-classification settings with a convolutional trunk. In these settings, \textsc{Grow} can achieve high final accuracy during the structural-editing procedure, while \textsc{Prune} is stronger when performance is averaged over the training trajectory or when the final sparse network is retrained from scratch. Interventions targeting optimizer state, insertion, selection, and trainability show that improving the integration of newborn units can improve adaptive performance, but does not automatically produce better final subnetworks. In continual-learning benchmarks stressing plasticity loss, \textsc{Grow} becomes competitive mainly when new units have enough time to integrate. Together, these results suggest that \textsc{Grow} should be evaluated not only as an architecture-search operator, but as a time-sensitive optimization process whose success depends on insertion stability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Continual Learning | CIFAR100 Split | Average Per-Task Accuracy22.1 | 117 | |
| Continual Supervised Learning | CIFAR 5+1 | Total Average Online Task Accuracy38.3 | 49 | |
| Continual Supervised Learning | Continual ImageNet | Total Average Online Task Accuracy72.2 | 49 | |
| Continual Supervised Learning | CIFAR Random Label | Total Average Online Task Accuracy19.5 | 49 | |
| Continual Learning | Permuted MNIST | Average Accuracy76.1 | 32 | |
| Continual Learning | MNIST Random-Label | Average Accuracy23.7 | 32 |