Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Universal Instance Perception as Object Discovery and Retrieval

About

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks. In this work, we present a universal instance perception model of the next generation, termed UNINEXT. UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts. This unified formulation brings the following benefits: (1) enormous data from different tasks and label vocabularies can be exploited for jointly training general instance-level representations, which is especially beneficial for tasks lacking in training data. (2) the unified model is parameter-efficient and can save redundant computation when handling multiple tasks simultaneously. UNINEXT shows superior performance on 20 challenging benchmarks from 10 instance-level tasks including classical image-level tasks (object detection and instance segmentation), vision-and-language tasks (referring expression comprehension and segmentation), and six video-level object tracking tasks. Code is available at https://github.com/MasterBin-IIAU/UNINEXT.

Bin Yan, Yi Jiang, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu• 2023

Related benchmarks

TaskDatasetResultRank
Object DetectionCOCO 2017 (val)
AP60.6
2454
Instance SegmentationCOCO 2017 (val)--
1144
Video Object SegmentationDAVIS 2017 (val)
J mean77.7
1130
Object DetectionCOCO (val)--
613
Video Instance SegmentationYouTube-VIS 2019 (val)
AP66.9
567
Video Object SegmentationYouTube-VOS 2018 (val)
J Score (Seen)79.9
493
Visual Object TrackingTrackingNet (test)
Normalized Precision (Pnorm)88.2
460
Visual Object TrackingLaSOT (test)
AUC72.4
444
Referring Expression ComprehensionRefCOCO+ (val)
Accuracy85.24
345
Referring Expression ComprehensionRefCOCO (val)
Accuracy92.64
335
Showing 10 of 122 rows
...

Other info

Code

Follow for update