AnyUp: Universal Feature Upsampling

About

We introduce AnyUp, a method for feature upsampling that can be applied to any vision feature at any resolution, without encoder-specific training. Existing learning-based upsamplers for features like DINO or CLIP need to be re-trained for every feature extractor and thus do not generalize to different feature types at inference time. In this work, we propose an inference-time feature-agnostic upsampling architecture to alleviate this limitation and improve upsampling quality. In our experiments, AnyUp sets a new state of the art for upsampled features, generalizes to different feature types, and preserves feature semantics while being efficient and easy to apply to a wide range of downstream tasks.

Thomas Wimmer, Prune Truong, Marie-Julie Rakotosaona, Michael Oechsle, Federico Tombari, Bernt Schiele, Jan Eric Lenssen• 2025

Related benchmarks

Task	Dataset	Result
Semantic segmentation	ADE20K (val)	mIoU42.25	3069
Semantic segmentation	ADE20K	mIoU42.26	559
Semantic segmentation	COCO Stuff	mIoU62.16	399
Semantic segmentation	PASCAL VOC (val)	mIoU84.33	380
Semantic segmentation	Pascal VOC	mIoU0.84	280
Monocular Depth Estimation	NYU V2	Delta 1 Acc92.33	174
Semantic segmentation	COCO Stuff (val)	mIoU62.08	167
Depth Estimation	NYU V2	--	167
Semantic segmentation	Pascal VOC	mIoU83.85	159
Video Object Segmentation	DAVIS	J & F Mean72.44	128

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord