Beyond Flat Unknown Labels in Open-World Object Detection
About
Most object detectors operate under a closed-world assumption, recognizing only the classes annotated in the training dataset and failing when encountering novel objects. Open-World Object Detection (OWOD) relaxes this assumption by enabling unseen objects to be detected as "Unknown". However, collapsing all novel objects into a single undifferentiated label eliminates semantic granularity and limits informed decision-making. In this paper, we introduce BOUND, an open-world detector that advances OWOD by inferring coarse-grained categories of unknown objects rather than merely flagging their existence. This enriched representation offers semantic cues that may benefit real-world systems. For example, in autonomous driving, distinguishing between an "Unknown Animal" (requiring yielding) and an "Unknown Debris" (requiring rerouting) leads to fundamentally different planning behaviors. Technically, BOUND integrates a sparsemax-based head for modeling objectness, a hierarchy-guided relabeling component that provides auxiliary supervision, and a classification module that learns hierarchical relationships. Experiments on OWOD benchmarks demonstrate that BOUND achieves higher unknown recall than existing baselines without sacrificing known-class mAP, while additionally enabling structured hierarchical categorization of unknown instances. Furthermore, evaluations on the long-tail LVIS dataset demonstrate robust generalization. Code will be made available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Open World Object Detection | OWOD Task 1 | U-Recall22.6 | 13 | |
| Open World Object Detection | OWOD Task 2 | U-R24.8 | 13 | |
| Open World Object Detection | OWOD Task 3 | U-R28.3 | 13 | |
| Open World Object Detection | OWOD Task 4 | mAP44.4 | 13 | |
| Open-vocabulary object detection | LVIS | -- | 7 |