Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tracking through Containers and Occluders in the Wild

About

Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce $\textbf{TCOW}$, a new benchmark and model for visual tracking through heavy occlusion and containment. We set up a task where the goal is to, given a video sequence, segment both the projected extent of the target object, as well as the surrounding container or occluder whenever one exists. To study this task, we create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance under various forms of task variation, such as moving or nested containment. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.

Basile Van Hoorick, Pavel Tokmakov, Simon Stent, Jie Li, Carl Vondrick• 2023

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationTCOW Kubric Random synthetic (test)
Itgt Score (All)53
11
Video Object SegmentationTCOW Kubric Containers synthetic (test)
Target IoU (All)36.8
11
Video Object SegmentationRubric Office real-world
Jtarget Score72.5
8
Video Object SegmentationRubric DAV/YTB (real-world)
Jtarget52.8
8
Video Object SegmentationRubric Cup Games (real-world)
Jtarget38.3
8
Amodal Bounding Box DetectionTAO-Amodal custom 100-clip 1.0
AP@2527.8
6
Showing 6 of 6 rows

Other info

Follow for update