Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation

About

Recent vision-language models (VLMs) face significant challenges in test-time adaptation to novel domains. While cache-based methods show promise by leveraging historical information, they struggle with both caching unreliable feature-label pairs and indiscriminately using single-class information during querying, significantly compromising adaptation accuracy. To address these limitations, we propose COSMIC (Clique-Oriented Semantic Multi-space Integration for CLIP), a robust test-time adaptation framework that enhances adaptability through multi-granular, cross-modal semantic caching and graph-based querying mechanisms. Our framework introduces two key innovations: Dual Semantics Graph (DSG) and Clique Guided Hyper-class (CGH). The Dual Semantics Graph constructs complementary semantic spaces by incorporating textual features, coarse-grained CLIP features, and fine-grained DINOv2 features to capture rich semantic relationships. Building upon these dual graphs, the Clique Guided Hyper-class component leverages structured class relationships to enhance prediction robustness through correlated class selection. Extensive experiments demonstrate COSMIC's superior performance across multiple benchmarks, achieving significant improvements over state-of-the-art methods: 15.81% gain on out-of-distribution tasks and 5.33% on cross-domain generation with CLIP RN-50. Code is available at github.com/hf618/COSMIC.

Fanding Huang, Jingyan Jiang, Qinting Jiang, Hebei Li, Faisal Nadeem Khan, Zhi Wang• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationFlowers102
Accuracy82.1
558
Image ClassificationDTD
Accuracy58.2
542
Image ClassificationFood101
Accuracy86.6
457
Image ClassificationSUN397
Accuracy72.3
441
Image ClassificationAircraft
Accuracy31.4
333
Image ClassificationStanfordCars
Accuracy71.3
312
Image ClassificationPets
Accuracy94.2
245
Image ClassificationCaltech101
Accuracy96.8
228
Image ClassificationEuroSAT
Accuracy58.8
207
Image ClassificationUCF101
Accuracy76.2
47
Showing 10 of 11 rows

Other info

Follow for update