Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

About

We present T-Rex2, a highly practical model for open-set object detection. Previous open-set object detection methods relying on text prompts effectively encapsulate the abstract concept of common objects, but struggle with rare or complex object representation due to data scarcity and descriptive limitations. Conversely, visual prompts excel in depicting novel objects through concrete visual examples, but fall short in conveying the abstract concept of objects as effectively as text prompts. Recognizing the complementary strengths and weaknesses of both text and visual prompts, we introduce T-Rex2 that synergizes both prompts within a single model through contrastive learning. T-Rex2 accepts inputs in diverse formats, including text prompts, visual prompts, and the combination of both, so that it can handle different scenarios by switching between the two prompt modalities. Comprehensive experiments demonstrate that T-Rex2 exhibits remarkable zero-shot object detection capabilities across a wide spectrum of scenarios. We show that text prompts and visual prompts can benefit from each other within the synergy, which is essential to cover massive and complicated real-world scenarios and pave the way towards generic object detection. Model API is now available at \url{https://github.com/IDEA-Research/T-Rex}.

Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang• 2024

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO (val)
mAP46.5
613
Object DetectionLVIS (val)
mAP45.8
141
Object DetectionLVIS (minival)
AP54.9
127
Object DetectionODinW-13
AP50.3
98
Object DetectionLVIS mini (val)
mAP54.9
86
Object DetectionCOCO
AP (bbox)52.2
59
Object DetectionODinW-35
AP22
59
Object DetectionODinW 35 datasets (test)
Average AP22
15
Interactive Object DetectionCOCO (val)
AP58.5
4
Interactive Object DetectionLVIS (minival)
AP62.5
4
Showing 10 of 11 rows

Other info

Follow for update