Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Semantic-aligned Fusion Transformer for One-shot Object Detection

About

One-shot object detection aims at detecting novel objects according to merely one given instance. With extreme data scarcity, current approaches explore various feature fusions to obtain directly transferable meta-knowledge. Yet, their performances are often unsatisfactory. In this paper, we attribute this to inappropriate correlation methods that misalign query-support semantics by overlooking spatial structures and scale variances. Upon analysis, we leverage the attention mechanism and propose a simple but effective architecture named Semantic-aligned Fusion Transformer (SaFT) to resolve these issues. Specifically, we equip SaFT with a vertical fusion module (VFM) for cross-scale semantic enhancement and a horizontal fusion module (HFM) for cross-sample feature fusion. Together, they broaden the vision for each feature point from the support to a whole augmented feature pyramid from the query, facilitating semantic-aligned associations. Extensive experiments on multiple benchmarks demonstrate the superiority of our framework. Without fine-tuning on novel classes, it brings significant performance gains to one-stage baselines, lifting state-of-the-art results to a higher level.

Yizhou Zhao, Xun Guo, Yan Lu• 2022

Related benchmarks

TaskDatasetResultRank
Object DetectionMS-COCO 2017 (val)
Base Avg AP5048.3
27
Object DetectionCOCO 2017 (Split-3)
Base AP5047.9
6
Object DetectionCOCO 2017 (Split-4)
bAP5049
6
Object DetectionCOCO Average across splits 2017 (Avg)
bAP5048.3
6
Object DetectionCOCO 2017 (Split-1)
Base AP5049.2
6
Object DetectionCOCO 2017 (Split-2)
bAP5047.2
6
Object DetectionPASCAL VOC Base classes 2007 (test)
AP (Plant)59.7
5
Object DetectionPASCAL VOC Novel classes 2007 (test)
Cow88.1
5
Showing 8 of 8 rows

Other info

Follow for update