Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

About

Based on analyzing the character of cascaded decoder architecture commonly adopted in existing DETR-like models, this paper proposes a new decoder architecture. The cascaded decoder architecture constrains object queries to update in the cascaded direction, only enabling object queries to learn relatively-limited information from image features. However, the challenges for object detection in natural scenes (e.g., extremely-small, heavily-occluded, and confusingly mixed with the background) require an object detection model to fully utilize image features, which motivates us to propose a new decoder architecture with the parallel Multi-time Inquiries (MI) mechanism. MI enables object queries to learn more comprehensive information, and our MI based model, MI-DETR, outperforms all existing DETR-like models on COCO benchmark under different backbones and training epochs, achieving +2.3 AP and +0.6 AP improvements compared to the most representative model DINO and SOTA model Relation-DETR under ResNet-50 backbone. In addition, a series of diagnostic and visualization experiments demonstrate the effectiveness, rationality, and interpretability of MI.

Zhixiong Nan, Xianghong Li, Jifeng Dai, Tao Xiang• 2025

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP58.2
2454
Showing 1 of 1 rows

Other info

Follow for update