MatteFormer: Transformer-Based Image Matting via Prior-Tokens
About
In this paper, we propose a transformer-based image matting model called MatteFormer, which takes full advantage of trimap information in the transformer block. Our method first introduces a prior-token which is a global representation of each trimap region (e.g. foreground, background and unknown). These prior-tokens are used as global priors and participate in the self-attention mechanism of each block. Each stage of the encoder is composed of PAST (Prior-Attentive Swin Transformer) block, which is based on the Swin Transformer block, but differs in a couple of aspects: 1) It has PA-WSA (Prior-Attentive Window Self-Attention) layer, performing self-attention not only with spatial-tokens but also with prior-tokens. 2) It has prior-memory which saves prior-tokens accumulatively from the previous blocks and transfers them to the next block. We evaluate our MatteFormer on the commonly used image matting datasets: Composition-1k and Distinctions-646. Experiment results show that our proposed method achieves state-of-the-art performance with a large margin. Our codes are available at https://github.com/webtoon/matteformer.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Matting | Composition-1K (test) | SAD23.8 | 203 | |
| Matting | Distinction-646 (test) | SAD23.6 | 45 | |
| Natural Image Matting | Distinctions-646 (test) | SAD23.9 | 21 | |
| Portrait Matting | PPM-100 (test) | MSE0.0092 | 19 | |
| Semantic Image Matting | Semantic Image Matting Dataset (test) | SAD29.66 | 16 | |
| Image Matting | AIM-500 | SAD26.87 | 14 | |
| Image Matting | Adobe Composition-1K | SAD23.8 | 12 | |
| Image Matting | Distinctions-646 | SAD23.6 | 10 | |
| Image Matting | Semantic Image Matting | SAD23.9 | 8 | |
| Interactive Matting | HIM-100K (test) | MSE0.0039 | 8 |