Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers
About
Recent advancements in deep learning have shown impressive results in image and video denoising, leveraging extensive pairs of noisy and noise-free data for supervision. However, the challenge of acquiring paired videos for dynamic scenes hampers the practical deployment of deep video denoising techniques. In contrast, this obstacle is less pronounced in image denoising, where paired data is more readily available. Thus, a well-trained image denoiser could serve as a reliable spatial prior for video denoising. In this paper, we propose a novel unsupervised video denoising framework, named ``Temporal As a Plugin'' (TAP), which integrates tunable temporal modules into a pre-trained image denoiser. By incorporating temporal modules, our method can harness temporal information across noisy frames, complementing its power of spatial denoising. Furthermore, we introduce a progressive fine-tuning strategy that refines each temporal module using the generated pseudo clean video frames, progressively enhancing the network's denoising performance. Compared to other unsupervised video denoising methods, our framework demonstrates superior performance on both sRGB and raw video denoising datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Denoising | Set8 | PSNR (sigma=10)38.02 | 12 | |
| Video Denoising | DAVIS | PSNR (sigma=10)39.8 | 12 | |
| Surgical Desmoking | Real-world surgical dataset | SSEQ16.646 | 10 | |
| Video Denoising | CRVD indoor | PSNR (ISO 1600)48.85 | 7 | |
| Video Denoising | RealisVideo-4K | PSNR (sigma=1.0)35.3 | 6 | |
| Video Restoration | RealisVideo-4K 3840×2160 (test) | PSNR32.496 | 6 | |
| Video Restoration | UVG 4K original | PSNR31.061 | 6 |