MatAnyone: Stable Video Matting with Consistent Memory Propagation
About
Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Matting | VideoMatte 512 x 288 (test) | MAD2.72 | 17 | |
| Video Matting | VideoMatte 512 x 288 | MAD2.72 | 13 | |
| Video Matting | VideoMatte 1920 x 1080 | MAD1.99 | 13 | |
| Matting | CineMatte-4K-Image | MAD1.975 | 10 | |
| Video Matting | VideoMatte240K (test) | MAD4.902 | 10 | |
| Video Matting | YouTubeMatte (test) | MAD2.667 | 10 | |
| Video Matting | VideoMatte 1920 x 1080 (test) | MAD4.24 | 9 | |
| Video Matting | Real-world benchmark | MAD0.19 | 8 | |
| Video Matting | YoutubeMatte 1920 x 1080 (test) | MAD1.99 | 8 | |
| Video Matting | VideoMatte | MAD4.37 | 8 |