AnyUp: Universal Feature Upsampling
About
We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.
Thomas Wimmer, Prune Truong, Marie-Julie Rakotosaona, Michael Oechsle, Federico Tombari, Bernt Schiele, Jan Eric Lenssen• 2025
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic segmentation | ADE20K (val) | mIoU42.25 | 2731 | |
| Semantic segmentation | PASCAL VOC (val) | mIoU84.33 | 338 | |
| Semantic segmentation | COCO Stuff | mIoU62.16 | 195 | |
| Semantic segmentation | Pascal VOC | mIoU0.84 | 172 | |
| Semantic segmentation | COCO Stuff (val) | mIoU62.08 | 126 | |
| Monocular Depth Estimation | NYU V2 | Delta 1 Acc92.33 | 113 | |
| Semantic segmentation | VOC | mIoU84 | 44 | |
| Semantic segmentation | ADE20K | mIoU42.43 | 30 | |
| Surface Normal Estimation | NYU V2 | RMSE31.17 | 23 | |
| Depth Estimation | COCO (val) | δ161.32 | 9 |
Showing 10 of 12 rows