MAT: Mask-Aware Transformer for Large Hole Image Inpainting
About
Recent studies have shown the importance of modeling long-range interactions in the inpainting problem. To achieve this goal, existing approaches exploit either standalone attention techniques or transformers, but usually under a low resolution in consideration of computational cost. In this paper, we present a novel transformer-based model for large hole inpainting, which unifies the merits of transformers and convolutions to efficiently process high-resolution images. We carefully design each component of our framework to guarantee the high fidelity and diversity of recovered images. Specifically, we customize an inpainting-oriented transformer block, where the attention module aggregates non-local information only from partial valid tokens, indicated by a dynamic mask. Extensive experiments demonstrate the state-of-the-art performance of the new model on multiple benchmark datasets. Code is released at https://github.com/fenglinglwb/MAT.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Inpainting | Places2 Wide Mask 512x512 (test) | FID4.76 | 30 | |
| Image Inpainting | Places2 512x512 (test) | LPIPS0.099 | 20 | |
| Image Inpainting | CelebA-HQ 256x256 (test) | FID2.94 | 19 | |
| Image Inpainting | Places 512x512 (test) | FID0.78 | 18 | |
| Image Inpainting | MISATO @512 (test) | LPIPS0.176 | 17 | |
| Image Inpainting | CelebA-HQ 512x512 (test) | LPIPS0.065 | 16 | |
| Inpainting | Places2 512x512 Narrow Mask (test) | FID0.98 | 15 | |
| Inpainting | Places2 Medium Mask 512x512 (test) | FID2.45 | 15 | |
| Inpainting | Places2 Narrow Mask 512 x 512 | FID0.98 | 15 | |
| Inpainting | Places2 Medium Mask 512 x 512 | FID2.45 | 15 |