MatAnyone: Stable Video Matting with Consistent Memory Propagation
About
Auxiliary-free human video matting methods, which rely solely on input frames, often struggle with complex or ambiguous backgrounds. To address this, we propose MatAnyone, a robust framework tailored for target-assigned video matting. Specifically, building on a memory-based paradigm, we introduce a consistent memory propagation module via region-adaptive memory fusion, which adaptively integrates memory from the previous frame. This ensures semantic stability in core regions while preserving fine-grained details along object boundaries. For robust training, we present a larger, high-quality, and diverse dataset for video matting. Additionally, we incorporate a novel training strategy that efficiently leverages large-scale segmentation data, boosting matting stability. With this new network design, dataset, and training strategy, MatAnyone delivers robust and accurate video matting results in diverse real-world scenarios, outperforming existing methods.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Video Matting | VideoMatte 512 x 288 (test) | MAD2.72 | 17 | |
| Video Matting | VideoMatte 512 x 288 | MAD2.72 | 13 | |
| Video Matting | VideoMatte 1920 x 1080 | MAD1.99 | 13 | |
| Video Matting | VideoMatte 1920 x 1080 (test) | MAD4.24 | 9 | |
| Video Matting | Real-world benchmark | MAD0.19 | 8 | |
| Video Matting | YoutubeMatte 1920 x 1080 (test) | MAD1.99 | 8 | |
| Video Matting | CRGNN real-world (19 videos) | MAD5.76 | 7 | |
| Video Matting | RVM Real-world Benchmark | MAD0.14 | 6 | |
| Video Generation | User Study | Overall Score2.82 | 4 | |
| Video Matting | V-HIM60 Hard 14 | MAD5.7195 | 4 |