Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Visual Grounding with Attention-Driven Constraint Balancing

About

Unlike Object Detection, Visual Grounding task necessitates the detection of an object described by complex free-form language. To simultaneously model such complex semantic and visual representations, recent state-of-the-art studies adopt transformer-based models to fuse features from both modalities, further introducing various modules that modulate visual features to align with the language expressions and eliminate the irrelevant redundant information. However, their loss function, still adopting common Object Detection losses, solely governs the bounding box regression output, failing to fully optimize for the above objectives. To tackle this problem, in this paper, we first analyze the attention mechanisms of transformer-based models. Building upon this, we further propose a novel framework named Attention-Driven Constraint Balancing (AttBalance) to optimize the behavior of visual features within language-relevant regions. Extensive experimental results show that our method brings impressive improvements. Specifically, we achieve constant improvements over five different models evaluated on four different benchmarks. Moreover, we attain a new state-of-the-art performance by integrating our method into QRNet.

Weitai Kang, Luowei Zhou, Junyi Wu, Changchang Sun, Yan Yan• 2024

Related benchmarks

TaskDatasetResultRank
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy77.5
345
Referring Expression ComprehensionRefCOCO (val)
Accuracy87.3
335
Referring Expression ComprehensionRefCOCO (testA)
Accuracy0.896
333
Referring Expression ComprehensionRefCOCOg (test)
Accuracy79.63
291
Referring Expression ComprehensionRefCOCOg (val)
Accuracy79.86
291
Referring Expression ComprehensionRefCOCO+ (test-A)
Accuracy82
172
Referring Expression ComprehensionRefCOCO+ (test-B)
Accuracy68.6
167
Referring Expression ComprehensionRefCOCO (test-B)
Accuracy83.9
160
Showing 8 of 8 rows

Other info

Follow for update