Multiscale Training of Convolutional Neural Networks
About
Training convolutional neural networks (CNNs) on high-resolution images is often bottlenecked by the cost of evaluating gradients of the loss on the finest spatial mesh. To address this, we propose Multiscale Gradient Estimation (MGE), a Multilevel Monte Carlo-inspired estimator that expresses the expected gradient on the finest mesh as a telescopic sum of gradients computed on progressively coarser meshes. By assigning larger batches to the cheaper coarse levels, MGE achieves the same variance as single-scale stochastic gradient estimation while reducing the number of fine mesh convolutions by a factor of 4 with each downsampling. We further embed MGE within a Full-Multiscale training algorithm that solves the learning problem on coarse meshes first and "hot-starts" the next finer level, cutting the required fine mesh iterations by an additional order of magnitude. Extensive experiments on image denoising, deblurring, inpainting and super-resolution tasks using UNet, ResNet and ESPCN backbones confirm the practical benefits: Full-Multiscale reduces the computation costs by 4-16x with no significant loss in performance. Together, MGE and Full-Multiscale offer a principled, architecture-agnostic route to accelerate CNN training on high-resolution data without sacrificing accuracy, and they can be combined with other variance-reduction or learning-rate schedules to further enhance scalability.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Denoising | Image Denoising Dataset (test) | MSE0.0839 | 10 | |
| Image Inpainting | Image Inpainting Dataset (test) | SSIM91.12 | 10 | |
| Image Deblurring | Image Deblurring Dataset (test) | MSE0.1156 | 10 | |
| Image Super-resolution | Image Super-resolution Dataset (test) | SSIM79.82 | 10 | |
| Denoising | CelebA | -- | 8 |