Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention

About

Few-shot image classification remains difficult under limited supervision and visual domain shift. Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some extent by learning lightweight residual adapters over frozen features, yet they still inherit CLIP's tendency to encode global, general-purpose representations that are not optimally discriminative to adapt the generalist to the specialist's domain in low-data regimes. We address this limitation with a novel patch-driven relational refinement that learns cache adapter weights from intra-image patch dependencies rather than treating an image embedding as a monolithic vector. Specifically, we introduce a relational gated graph attention network that constructs a patch graph and performs edge-aware attention to emphasize informative inter-patch interactions, producing context-enriched patch embeddings. A learnable multi-aggregation pooling then composes these into compact, task-discriminative representations that better align cache keys with the target few-shot classes. Crucially, the proposed graph refinement is used only during training to distil relational structure into the cache, incurring no additional inference cost beyond standard cache lookup. Final predictions are obtained by a residual fusion of cache similarity scores with CLIP zero-shot logits. Extensive evaluations on 11 benchmarks show consistent gains over state-of-the-art CLIP adapter and cache-based baselines while preserving zero-shot efficiency. We further validate battlefield relevance by introducing an Injured vs. Uninjured Soldier dataset for casualty recognition. It is motivated by the operational need to support triage decisions within the "platinum minutes" and the broader "golden hour" window in time-critical UAV-driven search-and-rescue and combat casualty care.

Tasweer Ahmad, Arindam Sikdar, Sandip Pradhan, Ardhendu Behera• 2025

Related benchmarks

TaskDatasetResultRank
Image ClassificationEuroSAT
Accuracy85.6
497
Image ClassificationStanford Cars
Accuracy78
477
Image ClassificationSUN397
Accuracy74.8
246
ClassificationFood101--
51
Image ClassificationImageNet (INet)
Accuracy71.2
50
ClassificationCaltech101
Accuracy96.2
34
Image ClassificationFGVC Aircraft Air
Accuracy45.3
23
ClassificationUCF101
Accuracy82.7
12
ClassificationOxfordPets
Accuracy89.4
10
Image ClassificationInjured vs. Uninjured soldier dataset
Accuracy94.9
10
Showing 10 of 12 rows

Other info

Follow for update