Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Test-Time Visual In-Context Tuning

About

Visual in-context learning (VICL), as a new paradigm in computer vision, allows the model to rapidly adapt to various tasks with only a handful of prompts and examples. While effective, the existing VICL paradigm exhibits poor generalizability under distribution shifts. In this work, we propose test-time Visual In-Context Tuning (VICT), a method that can adapt VICL models on the fly with a single test sample. Specifically, we flip the role between the task prompts and the test sample and use a cycle consistency loss to reconstruct the original task prompt output. Our key insight is that a model should be aware of a new test distribution if it can successfully recover the original task prompts. Extensive experiments on six representative vision tasks ranging from high-level visual understanding to low-level image processing, with 15 common corruptions, demonstrate that our VICT can improve the generalizability of VICL to unseen new domains. In addition, we show the potential of applying VICT for unseen tasks at test time. Code: https://github.com/Jiahao000/VICT.

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K-C
Gauss. Error24.4
21
Depth EstimationNYU C v2
A.Rel (Brightness)0.083
16
DerainingRain-C level 5 (test)
Brightness Corruption Score17.38
8
Low-light enhancementLoL-C Level 1 1.0 (test)
Brightness20.72
8
Panoptic SegmentationCOCO-C
Brightness Score42.9
8
Panoptic SegmentationCOCO-C level 5
Brightness Score38.7
8
Semantic segmentationADE20K-C level 1
mIoU (Brightness)49.2
8
ColorizationImageNet (test val)
PSNR20.04
4
DeblurringGoPro (test val)
PSNR24.5
4
DenoisingSIDD-C (level 1)
PSNR (Bright)28.07
4
Showing 10 of 44 rows

Other info

Code

Follow for update