Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

About

Open-Vocabulary Object Detection (OVD) models break the limitations of closed-set detection, enabling the identification of unseen categories through natural language prompts. However, they exhibit notable limitations in fine-grained detection tasks involving attributes like color, material, and texture. We attribute this performance bottleneck in OVD models to a core issue: when category signals dominate, OVD models tend to marginalize attribute information during inference. This leads to incorrect binding between attributes and target objects. To address this, we propose the Dual-Stage Attribute Activation (DSAA) framework, which enhances fine-grained detection capabilities by strengthening attribute semantics at two critical stages. In the text embedding stage, we employ Attribute Prefix Adapter (APA) module to generate attribute prefixes that inject explicit attribute priors. To further amplify the influence of these attributes, our Key/Value (K/V) Modulator module then intervenes during the BERT encoding phase, selectively enhancing the Key and Value vectors of the corresponding attribute tokens. In addition, we introduce an attribute-aware contrastive loss to improve discrimination among same-category instances with different attributes during training. Experimental results on the FG-OVD benchmark demonstrate the effectiveness of our method across various mainstream open-vocabulary models.

Donghong Jiang, Endian Lin, Hanqing Liu, Mingjie Liu, Luoping Cui, Zhao Yang, Chuang Zhu• 2026

Related benchmarks

TaskDatasetResultRank
Open-vocabulary object detectionFG-OVD
Score (Hard Difficulty)52.9
16
Showing 1 of 1 rows

Other info

Follow for update