Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-modal Proxy Evolving for OOD Detection with Vision-Language Models

About

Reliable zero-shot detection of out-of-distribution (OOD) inputs is critical for deploying vision-language models in open-world settings. However, the lack of labeled negatives in zero-shot OOD detection necessitates proxy signals that remain effective under distribution shift. Existing negative-label methods rely on a fixed set of textual proxies, which (i) sparsely sample the semantic space beyond in-distribution (ID) classes and (ii) remain static while only visual features drift, leading to cross-modal misalignment and unstable predictions. In this paper, we propose CoEvo, a training- and annotation-free test-time framework that performs bidirectional, sample-conditioned adaptation of both textual and visual proxies. Specifically, CoEvo introduces a proxy-aligned co-evolution mechanism to maintain two evolving proxy caches, which dynamically mines contextual textual negatives guided by test images and iteratively refines visual proxies, progressively realigning cross-modal similarities and enlarging local OOD margins. Finally, we dynamically re-weight the contributions of dual-modal proxies to obtain a calibrated OOD score that is robust to distribution shift. Extensive experiments on standard benchmarks demonstrate that CoEvo achieves state-of-the-art performance, improving AUROC by 1.33% and reducing FPR95 by 45.98% on ImageNet-1K compared to strong negative-label baselines.

Hao Tang, Yu Liu, Shuanglin Yan, Fei Shen, Shengfeng He, Jing Qin• 2026

Related benchmarks

TaskDatasetResultRank
OOD DetectionImageNet-1K OOD (Average: OpenImage-O, Texture, iNaturalist, ImageNet-O) 1.0 (test)
AUROC97.95
61
OOD DetectionImageNet 1k (test)
FPR9510.22
49
OOD DetectionImageNet SUN
FPR@954.42
43
Out-of-Distribution DetectionOpenOOD Far-OoD average v1.5
AUROC96.7
39
Out-of-Distribution DetectionOpenOOD Near-OoD average v1.5
AUROC0.7537
39
OOD DetectionImageNet-1k ID Places OOD
AUROC95.8
35
Out-of-Distribution DetectionImageNet-1K (ID) vs Textures (OOD) (test)
FPR9512.42
34
OOD DetectioniNaturalist (OOD) / ImageNet-1k (ID) 1.0 (test)
FPR950.46
33
Image ClassificationImageNet-1K ID
Accuracy67.36
12
Showing 9 of 9 rows

Other info

Follow for update