Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Rotation Equivariant Mamba for Vision Tasks

About

Rotation equivariance constitutes one of the most general and crucial structural priors for visual data, yet it remains notably absent from current Mamba-based vision architectures. Despite the success of Mamba in natural language processing and its growing adoption in computer vision, existing visual Mamba models fail to account for rotational symmetry in their design. This omission renders them inherently sensitive to image rotations, thereby constraining their robustness and cross-task generalization. To address this limitation, we incorporate rotation symmetry, a universal and fundamental geometric prior in images, into Mamba-based architectures. Specifically, we introduce EQ-VMamba, the first rotation equivariant visual Mamba architecture for vision tasks. The core components of EQ-VMamba include a carefully designed rotation equivariant cross-scan strategy and group Mamba blocks. Moreover, we provide a rigorous theoretical analysis of the intrinsic equivariance error, demonstrating that the proposed architecture enforces end-to-end rotation equivariance throughout the network. Extensive experiments across multiple benchmarks -- including high-level image classification, mid-level semantic segmentation, and low-level image super-resolution -- demonstrate that EQ-VMamba consistently improves rotation robustness and achieves superior or competitive performance compared to non-equivariant baselines, while requiring approximately 50\% fewer parameters. These results indicate that embedding rotation equivariance not only effectively bolsters the robustness of visual Mamba models against rotation transformations, but also enhances overall performance with significantly improved parameter efficiency. Code is available at https://github.com/zhongchenzhao/EQ-VMamba.

Zhongchen Zhao, Qi Xie, Keyu Huang, Lei Zhang, Deyu Meng, Zongben Xu• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K
mIoU39.9
1024
Image Super-resolutionManga109
PSNR40.34
821
Image Super-resolutionSet5
PSNR38.59
692
Semantic segmentationCityscapes
mIoU80.36
658
Image Super-resolutionSet14
PSNR34.76
506
Image Super-resolutionUrban100
PSNR34.32
406
Semantic segmentationCOCO Stuff
mIoU38.69
379
Image Super-resolutionBSD100
PSNR (dB)32.63
271
Image ClassificationImageNet-100 (val)
Top-1 Accuracy88.7
205
Semantic segmentationLoveDA
mIoU45.69
166
Showing 10 of 13 rows

Other info

Follow for update