Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention

About

Few-shot image classification remains difficult under limited supervision and visual domain shift. Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some extent by learning lightweight residual adapters over frozen features, yet they still inherit CLIP's tendency to encode global, general-purpose representations that are not optimally discriminative to adapt the generalist to the specialist's domain in low-data regimes. We address this limitation with a novel patch-driven relational refinement that learns cache adapter weights from intra-image patch dependencies rather than treating an image embedding as a monolithic vector. Specifically, we introduce a relational gated graph attention network that constructs a patch graph and performs edge-aware attention to emphasize informative inter-patch interactions, producing context-enriched patch embeddings. A learnable multi-aggregation pooling then composes these into compact, task-discriminative representations that better align cache keys with the target few-shot classes. Crucially, the proposed graph refinement is used only during training to distil relational structure into the cache, incurring no additional inference cost beyond standard cache lookup. Final predictions are obtained by a residual fusion of cache similarity scores with CLIP zero-shot logits. Extensive evaluations on 11 benchmarks show consistent gains over state-of-the-art CLIP adapter and cache-based baselines while preserving zero-shot efficiency. We further validate battlefield relevance by introducing an Injured vs. Uninjured Soldier dataset for casualty recognition. It is motivated by the operational need to support triage decisions within the "platinum minutes" and the broader "golden hour" window in time-critical UAV-driven search-and-rescue and combat casualty care.

Tasweer Ahmad, Arindam Sikdar, Sandip Pradhan, Ardhendu Behera• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	Stanford Cars	Accuracy78	660
Image Classification	EuroSAT	Accuracy85.6	569
Image Classification	SUN397	Accuracy74.8	450
Classification	Food101	--	69
Image Classification	ImageNet (INet)	Accuracy71.2	62
Classification	Caltech101	Accuracy96.2	39
Image Classification	FGVC Aircraft Air	Accuracy45.3	23
Classification	UCF101	Accuracy82.7	16
Classification	Flowers102	Top-1 Accuracy96	12
Classification	Describable Textures (DTD)	Accuracy70.4	11

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord