Variational image compression with a scale hyperprior

About

We describe an end-to-end trainable model for image compression based on variational autoencoders. The model incorporates a hyperprior to effectively capture spatial dependencies in the latent representation. This hyperprior relates to side information, a concept universal to virtually all modern image codecs, but largely unexplored in image compression using artificial neural networks (ANNs). Unlike existing autoencoder compression methods, our model trains a complex prior jointly with the underlying autoencoder. We demonstrate that this model leads to state-of-the-art image compression when measuring visual quality using the popular MS-SSIM index, and yields rate-distortion performance surpassing published ANN-based methods when evaluated using a more traditional metric based on squared error (PSNR). Furthermore, we provide a qualitative comparison of models trained for different distortion metrics.

Johannes Ball\'e, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston• 2018

Related benchmarks

Task	Dataset	Result
Object Detection	COCO 2017 (val)	--	2930
Instance Segmentation	COCO 2017 (val)	--	1304
Image Compression	Kodak (test)	--	49
Watermarking	DiffusionDB	TPR @ 1% FPR (None)100	42
Watermark Removal	MS-COCO	BA Attack Resilience62.1	40
Image Compression	CLIC 2020	--	38
Watermark Removal	CelebA-HQ LoRA, w/o te	CLIP-T Score0.2611	24
Image Compression	Kodak 512 × 768 and 768 × 512	Bits Per Pixel (bpp)0.211	16
Image Compression	ImageNet-1k 224 × 224	bpp0.3338	16
Watermark Verification	DiffusionDB (test)	TPR@1%FPR45.4	15

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord