Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering

About

Vision Foundation Models (VFMs) extract spatially downsampled representations, posing challenges for pixel-level tasks. Existing upsampling approaches face a fundamental trade-off: classical filters are fast and broadly applicable but rely on fixed forms, while modern upsamplers achieve superior accuracy through learnable, VFM-specific forms at the cost of retraining for each VFM. We introduce Neighborhood Attention Filtering (NAF), which bridges this gap by learning adaptive spatial-and-content weights through Cross-Scale Neighborhood Attention and Rotary Position Embeddings (RoPE), guided solely by the high-resolution input image. NAF operates zero-shot: it upsamples features from any VFM without retraining, making it the first VFM-agnostic architecture to outperform VFM-specific upsamplers and achieve state-of-the-art performance across multiple downstream tasks. It maintains high efficiency, scaling to 2K feature maps and reconstructing intermediate-resolution maps at 18 FPS. Beyond feature upsampling, NAF demonstrates strong performance on image restoration, highlighting its versatility. Code and checkpoints are available at https://github.com/valeoai/NAF.

Loick Chambon, Paul Couairon, Eloi Zablocki, Alexandre Boulch, Nicolas Thome, Matthieu Cord• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K
mIoU42.7
366
Semantic segmentationPascal VOC
mIoU84.52
129
Semantic segmentationCOCO
mIoU62.18
103
Depth EstimationNYU V2--
57
Depth EstimationNYU v2 (val)--
53
Showing 5 of 5 rows

Other info

Follow for update