Learned Image Compression with Mixed Transformer-CNN Architectures
About
Learned image compression (LIC) methods have exhibited promising progress and superior rate-distortion performance compared with classical image compression standards. Most existing LIC methods are Convolutional Neural Networks-based (CNN-based) or Transformer-based, which have different advantages. Exploiting both advantages is a point worth exploring, which has two challenges: 1) how to effectively fuse the two methods? 2) how to achieve higher performance with a suitable complexity? In this paper, we propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity to incorporate the local modeling ability of CNN and the non-local modeling ability of transformers to improve the overall architecture of image compression models. Besides, inspired by the recent progress of entropy estimation models and attention modules, we propose a channel-wise entropy model with parameter-efficient swin-transformer-based attention (SWAtten) modules by using channel squeezing. Experimental results demonstrate our proposed method achieves state-of-the-art rate-distortion performances on three different resolution datasets (i.e., Kodak, Tecnick, CLIC Professional Validation) compared to existing LIC methods. The code is at https://github.com/jmliu206/LIC_TCM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | -- | 2454 | |
| Instance Segmentation | COCO 2017 (val) | -- | 1144 | |
| Image Compression | Kodak | BD-Rate (PSNR)-11.73 | 50 | |
| Image Compression | Tecnick | BD-Rate (PSNR)-11.47 | 36 | |
| Image Compression | Kodak (test) | BD-Rate-12.54 | 32 | |
| Image Compression | CLIC Professional (val) | BD-Rate (PSNR)-6.04 | 26 | |
| Image Compression | CLIC | BD-Rate (PSNR)-12 | 16 | |
| Lossy Compression | TouchandGo | BD-Rate-39.9 | 10 | |
| Lossy compression performance | ActiveCloth (test) | BD-Rate-49.8 | 10 | |
| Lossy Compression | ObjectFolder | BD-Rate23.7 | 10 |