Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Advancing Complex Video Object Segmentation via Progressive Concept Construction

About

We propose Segment Concept (SeC), a concept-driven video object segmentation (VOS) framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. SeC employs Large Vision-Language Models (LVLMs) to integrate visual cues across diverse frames, constructing robust conceptual priors. To balance semantic reasoning with computational overhead, SeC forwards the LVLMs only when a new scene appears, injecting concept-level features at those points. To rigorously assess VOS methods in scenarios demanding high-level conceptual reasoning and robust semantic understanding, we introduce the Semantic Complex Scenarios Video Object Segmentation benchmark (SeCVOS). SeCVOS comprises 160 manually annotated multi-scenario videos designed to challenge models with substantial appearance variations and dynamic scene transformations. Empirical evaluations demonstrate that SeC substantially outperforms state-of-the-art approaches, including SAM 2 and its advanced variants, on both SeCVOS and standard VOS benchmarks. In particular, SeC achieves an 11.8-point improvement over SAM 2.1 on SeCVOS, establishing a new state-of-the-art in concept-aware VOS.

Zhixiong Zhang, Shuangrui Ding, Xiaoyi Dong, Songxin He, Jianfan Lin, Junsong Tang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang• 2025

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS 2017 (val)--
1193
Video Object SegmentationYouTube-VOS 2019 (val)--
231
Video Object SegmentationSA-V (val)
J&F Score82.7
114
Video Object SegmentationSA-V (test)
J&F81.7
110
Video Object SegmentationLVOS v2 (val)
J&F86.5
54
Video Object SegmentationMOSE v2 (val)
J&F Score53.8
17
Video Object SegmentationMOSE v1 (val)
J&F Score75.3
17
Video Object SegmentationM³-VOS core
Jaccard (J)67.2
12
Video Object SegmentationSeCVOS No Scene Change
J&F Score84.2
7
Video Object SegmentationSeCVOS Single Scene Change
J&F69.6
7
Showing 10 of 14 rows

Other info

Follow for update