Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

NanoSD: Edge Efficient Foundation Model for Real Time Image Restoration

About

Latent diffusion models such as Stable Diffusion 1.5 offer strong generative priors that are highly valuable for image restoration, yet their full pipelines remain too computationally heavy for deployment on edge devices. Existing lightweight variants predominantly compress the denoising U-Net or reduce the diffusion trajectory, which disrupts the underlying latent manifold and limits generalization beyond a single task. We introduce NanoSD, a family of Pareto-optimal diffusion foundation models distilled from Stable Diffusion 1.5 through network surgery, feature-wise generative distillation, and structured architectural scaling jointly applied to the U-Net and the VAE encoder-decoder. This full-pipeline co-design preserves the generative prior while producing models that occupy distinct operating points along the accuracy-latency-size frontier (e.g., 130M-315M parameters, achieving real-time inference down to 20ms on mobile-class NPUs). We show that parameter reduction alone does not correlate with hardware efficiency, and we provide an analysis revealing how architectural balance, feature routing, and latent-space preservation jointly shape true on-device latency. When used as a drop-in backbone, NanoSD enables state-of-the-art performance across image super-resolution, image deblurring, face restoration, and monocular depth estimation, outperforming prior lightweight diffusion models in both perceptual quality and practical deployability. NanoSD establishes a general-purpose diffusion foundation model family suitable for real-time visual generation and restoration on edge devices.

Subhajit Sanyal, Srinivas Soumitri Miriyala, Akshay Janardan Bankar, Manjunath Arveti, Sowmya Vajrala, Shreyas Pandith, Sravanth Kodavanti, Abhishek Ameta, Harshit, Amit Satish Unde• 2026

Related benchmarks

TaskDatasetResultRank
Monocular Depth EstimationNYU v2 (test)
Abs Rel7.2
257
Monocular Depth EstimationKITTI (test)
Abs Rel Error11.8
103
Image Super-resolutionDIV2K v1 (val)
SSIM0.628
35
DehazingRESIDE
FID35.23
25
Derainingreal (test)
FID50.78
17
DeblurringRealBlur-J
FID52.41
17
DesnowingRealistic
FID34.83
17
Face RestorationCelebA synthetic (test)
LPIPS0.341
16
Single Image Super-ResolutionDRealSR 46 (test)
LPIPS0.276
9
Single Image Super-ResolutionRealSR 4 (test)
LPIPS0.272
9
Showing 10 of 13 rows

Other info

Follow for update