Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Love Me, Love My Label: Rethinking the Role of Labels in Prompt Retrieval for Visual In-Context Learning

About

Visual in-context learning (VICL) enables visual foundation models to handle multiple tasks by steering them with demonstrative prompts. The choice of such prompts largely influences VICL performance, standing out as a key challenge. Prior work has made substantial progress on prompt retrieval and reranking strategies, but mainly focuses on prompt images while overlooking labels. We reveal these approaches sometimes get visually similar but label-inconsistent prompts, which potentially degrade VICL performance. On the other hand, higher label consistency between query and prompts preferably indicates stronger VICL results. Motivated by these findings, we develop a framework named LaPR (Label-aware Prompt Retrieval), which highlights the role of labels in prompt selection. Our framework first designs an image-label joint representation for prompts to incorporate label cues explicitly. Besides, to handle unavailable query labels at test time, we introduce a mixture-of-expert mechanism to the dual encoders with query-adaptive routing. Each expert is expected to capture a specific label mode, while the router infers query-adaptive mixture weights and helps to learn label-aware representation. We carefully design alternative optimization for experts and router, with a VICL performance-guided contrastive loss and a label-guided contrastive loss, respectively. Extensive experiments show promising and consistent improvement of LaPR on in-context segmentation, detection, and colorization tasks. Moreover, LaPR generalizes well across feature extractors and cross-fold scenarios, suggesting the importance of label utilization in prompt retrieval for VICL. Code is available at https://github.com/luotc-why/CVPR26-LaPR.

Tianci Luo, Haohao Pan, Jinpeng Wang, Niu Lian, Xinrui Chen, Bin Chen, Shu-Tao Xia, Chun Yuan• 2026

Related benchmarks

TaskDatasetResultRank
Foreground segmentationPascal-5i (3)
mIoU38.3
25
Foreground segmentationPascal-5i Fold-1 (test)
mIoU47.44
25
Foreground segmentationPascal-5i Fold-0 (test)
mIoU42.81
25
Single Object DetectionPASCAL VOC 2012 (test)
mIoU34.64
24
Image ColorizationImageNet 1k (test)
MSE0.6
17
Foreground segmentationPascal 5^2 Avg
mIoU42.27
15
Foreground segmentationPASCAL-5i (fold-2)
mIoU40.52
12
Showing 7 of 7 rows

Other info

Follow for update