Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution

About

Visual autoregressive (VAR) models have recently emerged as a promising alternative for image generation, offering stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction. This encourages the exploration of VAR for image super-resolution (ISR), yet its application remains underexplored and faces two critical challenges: locality-biased attention, which fragments spatial structures, and residual-only supervision, which accumulates errors across scales, severely compromises global consistency of reconstructed images. To address these issues, we propose AlignVAR, a globally consistent visual autoregressive framework tailored for ISR, featuring two key components: (1) Spatial Consistency Autoregression (SCA), which applies an adaptive mask to reweight attention toward structurally correlated regions, thereby mitigating excessive locality and enhancing long-range dependencies; and (2) Hierarchical Consistency Constraint (HCC), which augments residual learning with full reconstruction supervision at each scale, exposing accumulated deviations early and stabilizing the coarse-to-fine refinement process. Extensive experiments demonstrate that AlignVAR consistently enhances structural coherence and perceptual fidelity over existing generative methods, while delivering over 10x faster inference with nearly 50% fewer parameters than leading diffusion-based approaches, establishing a new paradigm for efficient ISR.

Cencen Liu, Dongyang Zhang, Wen Yin, Jielei Wang, Tianyu Li, Ji Guo, Wenbo Jiang, Guoqing Wang, Guoming Lu (1 and 2) __INSTITUTION_9__ University of Electronic Science, Technology of China, (2) Ubiquitous Intelligence, Trusted Services Key Laboratory of Sichuan Province)• 2026

Related benchmarks

TaskDatasetResultRank
Image Super-resolutionRealSR
PSNR26.11
130
Image Super-resolutionDRealSR
MANIQA0.4685
130
Image Super-resolutionDIV2K (val)
LPIPS0.2955
106
Image Super-resolution512 x 512 resolution
Inference Time (s)0.43
6
Showing 4 of 4 rows

Other info

Follow for update