Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VL-SAM-V2: Open-World Object Detection with General and Specific Query Fusion

About

Current perception models have achieved remarkable success by leveraging large-scale labeled datasets, but still face challenges in open-world environments with novel objects. To address this limitation, researchers introduce open-set perception models to detect or segment arbitrary test-time user-input categories. However, open-set models rely on human involvement to provide predefined object categories as input during inference. More recently, researchers have framed a more realistic and challenging task known as open-ended perception that aims to discover unseen objects without requiring any category-level input from humans at inference time. Nevertheless, open-ended models suffer from low performance compared to open-set models. In this paper, we present VL-SAM-V2, an open-world object detection framework that is capable of discovering unseen objects while achieving favorable performance. To achieve this, we combine queries from open-set and open-ended models and propose a general and specific query fusion module to allow different queries to interact. By adjusting queries from open-set models, we enable VL-SAM-V2 to be evaluated in the open-set or open-ended mode. In addition, to learn more diverse queries, we introduce ranked learnable queries to match queries with proposals from open-ended models by sorting. Moreover, we design a denoising point training strategy to facilitate the training process. Experimental results on LVIS show that our method surpasses the previous open-set and open-ended methods, especially on rare objects.

Zhiwei Lin, Yongtao Wang• 2025

Related benchmarks

TaskDatasetResultRank
Object DetectionLVIS (val)
mAP42.5
170
Object DetectionLVIS (minival)
AP31.8
159
Object DetectionLVIS mini (val)
mAP51.7
120
Object DetectionCOCO
AP56
21
Open-ended instance segmentationLVIS mini (val)
AP (Mask)28.7
3
Showing 5 of 5 rows

Other info

Follow for update