Variational image compression with a scale hyperprior
About
We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | -- | 2843 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1275 | |
| Watermarking | DiffusionDB | TPR @ 1% FPR (None)100 | 42 | |
| Watermark Removal | MS-COCO | BA Attack Resilience62.1 | 40 | |
| Image Compression | Kodak (test) | -- | 35 | |
| Image Compression | CLIC 2020 | -- | 34 | |
| Watermark Removal | CelebA-HQ LoRA, w/o te | CLIP-T Score0.2611 | 24 | |
| Image Compression | Kodak 512 × 768 and 768 × 512 | Bits Per Pixel (bpp)0.211 | 16 | |
| Image Compression | ImageNet-1k 224 × 224 | bpp0.3338 | 16 | |
| Watermark Verification | DiffusionDB (test) | TPR@1%FPR45.4 | 15 |