Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation

About

We present Track Anything Behind Everything (TABE), a novel pipeline for zero-shot amodal video object segmentation. Unlike existing methods that require pretrained class labels, our approach uses a single query mask from the first frame where the object is visible, enabling flexible, zero-shot inference. We pose amodal segmentation as generative outpainting from modal (visible) masks using a pretrained video diffusion model. We do not need to re-train the diffusion model to accommodate additional input channels but instead use a pretrained model that we fine-tune at test-time to allow specialisation towards the tracked object. Our TABE pipeline is specifically designed to handle amodal completion, even in scenarios where objects are completely occluded. Our model and code will all be released.

Finlay G. C. Hudson, William A. P. Smith• 2024

Related benchmarks

Task	Dataset	Result	Rank
Amodal Bounding Box Detection	TAO-Amodal custom 100-clip 1.0	AP@2565.9		6

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord