Replacement Learning: Training Neural Networks with Fewer Parameters
About
End-to-end training with full-depth backpropagation remains the dominant paradigm for optimizing deep neural networks, but its efficiency deteriorates as models grow deeper. Since every block must be executed and differentiated under a single global objective, full-depth BP introduces substantial parameter redundancy, activation-memory cost, and training latency, especially when neighboring layers exhibit highly correlated learning patterns. Directly skipping or removing layers can reduce cost, but often weakens representation capacity or requires architecture-specific reuse designs. In this paper, we propose Replacement Learning (RepL), a training-time paradigm that reduces full-depth redundancy by replacing selected blocks rather than simply discarding them. For each removed block, RepL inserts a lightweight computing layer that synthesizes a surrogate operator from the parameters of its adjacent preceding and succeeding blocks through a learnable transformation, and applies the synthesized operator to the preceding activation. In this way, RepL preserves local contextual continuity while avoiding unnecessary full-layer computation. We instantiate RepL for CNNs and ViTs with tailored parameter-fusion blocks that handle convolutional channels, feature resolutions, and transformer submodules. Extensive experiments on CIFAR-10, SVHN, STL-10, ImageNet, COCO, and CityScapes show that RepL reduces trainable parameters, GPU memory usage, and training time while matching or surpassing standard end-to-end training across classification, detection, and segmentation. Additional results on WikiText-2, transfer learning, inference throughput, checkpointing, stochastic depth, and INT8 quantization further demonstrate its generality and compatibility.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Language Modeling | WikiText-2 (test) | PPL193.3 | 2333 | |
| Image Classification | CIFAR-10 (test) | Accuracy94.01 | 882 | |
| Object Detection | COCO (val) | mAP32.76 | 637 | |
| Image Classification | SVHN (test) | Accuracy97.06 | 470 | |
| Image Classification | STL-10 (test) | Accuracy80.45 | 364 | |
| Image Classification | ImageNet (val) | Top-1 Accuracy78.31 | 163 | |
| Semantic segmentation | Cityscapes | Overall Accuracy95.89 | 8 | |
| Image Classification | CIFAR-10 | Top-1 Accuracy95.89 | 2 | |
| Image Classification | SVHN | Top-1 Accuracy96.97 | 2 | |
| Image Classification | STL-10 | Top-1 Accuracy95.11 | 2 |