Linear Attention Modeling for Learned Image Compression
About
Recent years, learned image compression has made tremendous progress to achieve impressive coding efficiency. Its coding gain mainly comes from non-linear neural network-based transform and learnable entropy modeling. However, most studies focus on a strong backbone, and few studies consider a low complexity design. In this paper, we propose LALIC, a linear attention modeling for learned image compression. Specially, we propose to use Bi-RWKV blocks, by utilizing the Spatial Mix and Channel Mix modules to achieve more compact feature extraction, and apply the Conv based Omni-Shift module to adapt to two-dimensional latent representation. Furthermore, we propose a RWKV-based Spatial-Channel ConTeXt model (RWKV-SCCTX), that leverages the Bi-RWKV to modeling the correlation between neighboring features effectively. To our knowledge, our work is the first work to utilize efficient Bi-RWKV models with linear attention for learned image compression. Experimental results demonstrate that our method achieves competitive RD performances by outperforming VTM-9.1 by -15.26%, -15.41%, -17.63% in BD-rate on Kodak, CLIC and Tecnick datasets. The code is available at https://github.com/sjtu-medialab/RwkvCompress .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Compression | Kodak | BD-Rate (PSNR)-15.26 | 50 | |
| Image Compression | Tecnick | BD-Rate (PSNR)-17.63 | 36 | |
| Image Compression | Kodak (test) | -- | 32 | |
| Image Compression | CLIC | BD-Rate (PSNR)-15.41 | 16 | |
| Lossy Compression | TouchandGo | BD-Rate-51.6 | 10 | |
| Lossy compression performance | ActiveCloth (test) | BD-Rate-54.8 | 10 | |
| Lossy Compression | ObjectFolder | BD-Rate0.2 | 10 | |
| Lossy Compression | YCB-Slide | BD-Rate-4.6 | 10 | |
| Lossy Compression | SSVTP | BD-Rate4.3 | 10 | |
| Lossy Compression | ObjTac | BD-Rate32.8 | 10 |