Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Linear Attention Modeling for Learned Image Compression

About

Recent years, learned image compression has made tremendous progress to achieve impressive coding efficiency. Its coding gain mainly comes from non-linear neural network-based transform and learnable entropy modeling. However, most studies focus on a strong backbone, and few studies consider a low complexity design. In this paper, we propose LALIC, a linear attention modeling for learned image compression. Specially, we propose to use Bi-RWKV blocks, by utilizing the Spatial Mix and Channel Mix modules to achieve more compact feature extraction, and apply the Conv based Omni-Shift module to adapt to two-dimensional latent representation. Furthermore, we propose a RWKV-based Spatial-Channel ConTeXt model (RWKV-SCCTX), that leverages the Bi-RWKV to modeling the correlation between neighboring features effectively. To our knowledge, our work is the first work to utilize efficient Bi-RWKV models with linear attention for learned image compression. Experimental results demonstrate that our method achieves competitive RD performances by outperforming VTM-9.1 by -15.26%, -15.41%, -17.63% in BD-rate on Kodak, CLIC and Tecnick datasets. The code is available at https://github.com/sjtu-medialab/RwkvCompress .

Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, Li Song• 2025

Related benchmarks

TaskDatasetResultRank
Image CompressionKodak
BD-Rate (PSNR)-15.26
50
Image CompressionTecnick
BD-Rate (PSNR)-17.63
36
Image CompressionKodak (test)--
32
Image CompressionCLIC
BD-Rate (PSNR)-15.41
16
Lossy CompressionTouchandGo
BD-Rate-51.6
10
Lossy compression performanceActiveCloth (test)
BD-Rate-54.8
10
Lossy CompressionObjectFolder
BD-Rate0.2
10
Lossy CompressionYCB-Slide
BD-Rate-4.6
10
Lossy CompressionSSVTP
BD-Rate4.3
10
Lossy CompressionObjTac
BD-Rate32.8
10
Showing 10 of 17 rows

Other info

Code

Follow for update