Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

VideoMaMa: Mask-Guided Video Matting via Generative Prior

About

Generalizing video matting models to real-world videos remains a significant challenge due to the scarcity of labeled data. To address this, we present Video Mask-to-Matte Model (VideoMaMa) that converts coarse segmentation masks into pixel accurate alpha mattes, by leveraging pretrained video diffusion models. VideoMaMa demonstrates strong zero-shot generalization to real-world footage, even though it is trained solely on synthetic data. Building on this capability, we develop a scalable pseudo-labeling pipeline for large-scale video matting and construct the Matting Anything in Video (MA-V) dataset, which offers high-quality matting annotations for more than 50K real-world videos spanning diverse scenes and motions. To validate the effectiveness of this dataset, we fine-tune the SAM2 model on MA-V to obtain SAM2-Matte, which outperforms the same model trained on existing matting datasets in terms of robustness on in-the-wild videos. These findings emphasize the importance of large-scale pseudo-labeled video matting and showcase how generative priors and accessible segmentation cues can drive scalable progress in video matting research.

Sangbeom Lim, Seoung Wug Oh, Jiahui Huang, Heeji Yoon, Seungryong Kim, Joon-Young Lee• 2026

Related benchmarks

TaskDatasetResultRank
Video MattingV-HIM60 Hard
MAD1.306
29
Video MattingYouTubeMatte 1920x1080 (test)
MAD0.934
20
Video MattingV-HIM60 Easy 14
MAD1.3446
4
Video MattingV-HIM60 Medium 14
MAD2.271
4
Video MattingV-HIM60 Hard 14
MAD2.6112
4
Video MattingYouTubeMatte 1920 x 1080 48
MAD1.2695
4
Showing 6 of 6 rows

Other info

GitHub

Follow for update