DeAR: Fine-Grained VLM Adaptation by Decomposing Attention Head Roles

About

Prompt learning is a dominant paradigm for adapting pre-trained Vision-Language Models (VLMs) to downstream tasks. However, existing methods often rely on a simplistic, layer-centric view, assuming shallow layers capture general features while deep layers handle task-specific knowledge. This assumption results in uncontrolled interactions between learnable tokens and original tokens. Task-specific knowledge could degrades the model's core generalization and creates a trade-off between task adaptation and the preservation of zero-shot generalization. To address this, we challenge the layer-centric view and propose \textbf{DeAR}, a framework that achieves fine-grained VLM adaptation by \textbf{De}composing \textbf{A}ttention head \textbf{R}oles. We posit that the functional specialization within VLMs occurs not between layers, but at the finer-grained level of individual attention heads in the deeper layers. Based on this insight, we introduce a novel metric, Concept Entropy, to systematically classify attention heads into distinct functional roles: \textit{Attribute}, \textit{Generalization}, and \textit{Mixed}. Guided by these roles, we introduce specialized attribute tokens and a Role-Based Attention Mask mechanism to precisely control information flow, ensuring generalization heads remain isolated from task-specific knowledge. We further incorporate a Task-Adaptive Fusion Strategy for inference. Extensive experiments on fifteen datasets show that DeAR achieves a strong balance between task adaptation and generalization, outperforming previous methods across various tasks.

Yiming Ma, Hongkun Yang, Lionel Z. Wang, Bin Chen, Weizhi Xian, Jianzhi Teng• 2026

Related benchmarks

Task	Dataset	Result
Base-to-New Generalization	Avg over 11 datasets	Base Score85.94	102
Base-to-New Generalization	DTD	Base Accuracy83.9	94
Base-to-New Generalization	ImageNet	Base Accuracy78.12	93
Base-to-New Generalization	FGVCAircraft	Base Performance47.1	90
Base-to-New Generalization	OxfordPets	Base Score97.34	76
Base-to-New Generalization	UCF101	Base Accuracy87.9	71
Base-to-New Generalization	Caltech101	Base Score99	70
Base-to-New Generalization	StanfordCars	Base Score82.01	69
Base-to-novel generalization	Flowers102	Base Accuracy99	55
Base-to-novel generalization	SUN397	Base Score83.82	55

Showing 10 of 19 rows

Other info

Follow for update

@wizwand_team Discord