Linear Attention Modeling for Learned Image Compression

About

Recent years, learned image compression has made tremendous progress to achieve impressive coding efficiency. Its coding gain mainly comes from non-linear neural network-based transform and learnable entropy modeling. However, most studies focus on a strong backbone, and few studies consider a low complexity design. In this paper, we propose LALIC, a linear attention modeling for learned image compression. Specially, we propose to use Bi-RWKV blocks, by utilizing the Spatial Mix and Channel Mix modules to achieve more compact feature extraction, and apply the Conv based Omni-Shift module to adapt to two-dimensional latent representation. Furthermore, we propose a RWKV-based Spatial-Channel ConTeXt model (RWKV-SCCTX), that leverages the Bi-RWKV to modeling the correlation between neighboring features effectively. To our knowledge, our work is the first work to utilize efficient Bi-RWKV models with linear attention for learned image compression. Experimental results demonstrate that our method achieves competitive RD performances by outperforming VTM-9.1 by -15.26%, -15.41%, -17.63% in BD-rate on Kodak, CLIC and Tecnick datasets. The code is available at https://github.com/sjtu-medialab/RwkvCompress .

Donghui Feng, Zhengxue Cheng, Shen Wang, Ronghua Wu, Hongwei Hu, Guo Lu, Li Song• 2025

Related benchmarks

Task	Dataset	Result
Video Compression	MCL-JCV	--	79
Image Compression	Kodak	BD-Rate (PSNR)-15.49	58
Image Compression	Tecnick	BD-Rate-17.71	53
Image Compression	CLIC	BD-Rate (PSNR)-15.47	37
Image Compression	Kodak (test)	--	35
Image Compression	CLIC Professional (val)	BD-rate-15.41	34
Video Compression	HEVC Class B	BD-Rate-16.57	23
Video Compression	HEVC Class E	BD-Rate-20.6	23
Video Compression	HEVC Class C	BD-Rate-15.06	23
Video Compression	UVG	BD-Rate-16.92	23

Showing 10 of 25 rows

Other info

Code

Follow for update

@wizwand_team Discord