Multiscale Training of Convolutional Neural Networks

About

Training convolutional neural networks (CNNs) on high-resolution images is often bottlenecked by the cost of evaluating gradients of the loss on the finest spatial mesh. To address this, we propose Multiscale Gradient Estimation (MGE), a Multilevel Monte Carlo-inspired estimator that expresses the expected gradient on the finest mesh as a telescopic sum of gradients computed on progressively coarser meshes. By assigning larger batches to the cheaper coarse levels, MGE achieves the same variance as single-scale stochastic gradient estimation while reducing the number of fine mesh convolutions by a factor of 4 with each downsampling. We further embed MGE within a Full-Multiscale training algorithm that solves the learning problem on coarse meshes first and "hot-starts" the next finer level, cutting the required fine mesh iterations by an additional order of magnitude. Extensive experiments on image denoising, deblurring, inpainting and super-resolution tasks using UNet, ResNet and ESPCN backbones confirm the practical benefits: Full-Multiscale reduces the computation costs by 4-16x with no significant loss in performance. Together, MGE and Full-Multiscale offer a principled, architecture-agnostic route to accelerate CNN training on high-resolution data without sacrificing accuracy, and they can be combined with other variance-reduction or learning-rate schedules to further enhance scalability.

Shadab Ahamed, Niloufar Zakariaei, Eldad Haber, Moshe Eliasof• 2025

Related benchmarks

Task	Dataset	Result
Image Denoising	Image Denoising Dataset (test)	MSE0.0839	10
Image Inpainting	Image Inpainting Dataset (test)	SSIM91.12	10
Image Deblurring	Image Deblurring Dataset (test)	MSE0.1156	10
Image Super-resolution	Image Super-resolution Dataset (test)	SSIM79.82	10
Denoising	CelebA	--	8

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord