MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression
About
The latent representation in learned image compression encompasses channel-wise, local spatial, and global spatial correlations, which are essential for the entropy model to capture for conditional entropy minimization. Efficiently capturing these contexts within a single entropy model, especially in high-resolution image coding, presents a challenge due to the computational complexity of existing global context modules. To address this challenge, we propose the Linear Complexity Multi-Reference Entropy Model (MEM$^{++}$). Specifically, the latent representation is partitioned into multiple slices. For channel-wise contexts, previously compressed slices serve as the context for compressing a particular slice. For local contexts, we introduce a shifted-window-based checkerboard attention module. This module ensures linear complexity without sacrificing performance. For global contexts, we propose a linear complexity attention mechanism. It captures global correlations by decomposing the softmax operation, enabling the implicit computation of attention maps from previously decoded slices. Using MEM$^{++}$ as the entropy model, we develop the image compression method MLIC$^{++}$. Extensive experimental results demonstrate that MLIC$^{++}$ achieves state-of-the-art performance, reducing BD-rate by $13.39\%$ on the Kodak dataset compared to VTM-17.0 in Peak Signal-to-Noise Ratio (PSNR). Furthermore, MLIC$^{++}$ exhibits linear computational complexity and memory consumption with resolution, making it highly suitable for high-resolution image coding. Code and pre-trained models are available at https://github.com/JiangWeibeta/MLIC. Training dataset is available at https://huggingface.co/datasets/Whiteboat/MLIC-Train-100K.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Compression | Kodak | BD-Rate (PSNR)-15.09 | 58 | |
| Image Compression | Tecnick | BD-Rate (PSNR)-18.68 | 44 | |
| Image Compression | CLIC | BD-Rate (PSNR)-14.45 | 37 | |
| Image Compression | CLIC Professional (val) | BD-Rate (PSNR)-16.84 | 34 | |
| Image Compression | Kodak (test) | -- | 32 | |
| Lossy Image Compression | Wind turbine image dataset full-resolution | BD-rate (PSNR)7.54 | 14 | |
| Image Compression | T2 dataset | File Size (bytes)4.26e+3 | 8 | |
| Image Compression | T1 | File Size (bytes)4.17e+3 | 8 | |
| Image Compression | CLIC (test) | -- | 8 | |
| Image Compression | Tecnick original (test) | BD-Rate (MS-SSIM)-53.14 | 7 |