Total Variation Optimization Layers for Computer Vision
About
Optimization within a layer of a deep-net has emerged as a new direction for deep-net layer design. However, there are two main challenges when applying these layers to computer vision tasks: (a) which optimization problem within a layer is useful?; (b) how to ensure that computation within a layer remains efficient? To study question (a), in this work, we propose total variation (TV) minimization as a layer for computer vision. Motivated by the success of total variation in image processing, we hypothesize that TV as a layer provides useful inductive bias for deep-nets too. We study this hypothesis on five computer vision tasks: image classification, weakly supervised object localization, edge-preserving smoothing, edge detection, and image denoising, improving over existing baselines. To achieve these results we had to address question (b): we developed a GPU-based projected-Newton method which is $37\times$ faster than existing solutions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Denoising | CBSD68 (test) | PSNR31.26 | 92 | |
| Image Classification | CIFAR10-C (test) | Accuracy (Gaussian)42.7 | 52 | |
| Edge Detection | BIPED (test) | ODS87.4 | 31 | |
| Image Denoising | Kodak24 (test) | PSNR32.15 | 26 | |
| Color Image Denoising | McMaster (test) | PSNR32.32 | 19 | |
| Edge Detection | MDBD (test) | ODS86.3 | 18 | |
| Edge-preserving image smoothing | Edge-preserving image smoothing benchmark | WRMSE8.87 | 4 |