Advancing Complex Video Object Segmentation via Progressive Concept Construction
About
We propose Segment Concept (SeC), a concept-driven video object segmentation (VOS) framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. SeC employs Large Vision-Language Models (LVLMs) to integrate visual cues across diverse frames, constructing robust conceptual priors. To balance semantic reasoning with computational overhead, SeC forwards the LVLMs only when a new scene appears, injecting concept-level features at those points. To rigorously assess VOS methods in scenarios demanding high-level conceptual reasoning and robust semantic understanding, we introduce the Semantic Complex Scenarios Video Object Segmentation benchmark (SeCVOS). SeCVOS comprises 160 manually annotated multi-scenario videos designed to challenge models with substantial appearance variations and dynamic scene transformations. Empirical evaluations demonstrate that SeC substantially outperforms state-of-the-art approaches, including SAM 2 and its advanced variants, on both SeCVOS and standard VOS benchmarks. In particular, SeC achieves an 11.8-point improvement over SAM 2.1 on SeCVOS, establishing a new state-of-the-art in concept-aware VOS.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Object Segmentation | DAVIS 2017 (val) | -- | 1226 | |
| Visual Object Tracking | TrackingNet (test) | Normalized Precision (Pnorm)91.23 | 502 | |
| Video Object Segmentation | YouTube-VOS 2019 (val) | -- | 240 | |
| Visual Object Tracking | LaSOText (test) | AUC60.27 | 121 | |
| Video Object Segmentation | SA-V (val) | J&F Score82.7 | 114 | |
| Video Object Segmentation | SA-V (test) | J&F81.7 | 110 | |
| Video Object Segmentation | LVOS v2 (val) | J&F86.5 | 63 | |
| Video Object Segmentation | MOSE v2 (val) | J&F Score53.8 | 17 | |
| Video Object Segmentation | MOSE v1 (val) | J&F Score75.3 | 17 | |
| Video Object Tracking | OTB (test) | AUC70.55 | 13 |