Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

N\"UWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

About

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged. However, the encoding process of existing models suffers from either receptive spreading of defective regions or information loss of non-defective regions, giving rise to visually unappealing inpainting results. To address the above issues, this paper proposes N\"UWA-LIP by incorporating defect-free VQGAN (DF-VQGAN) with multi-perspective sequence to sequence (MP-S2S). In particular, DF-VQGAN introduces relative estimation to control receptive spreading and adopts symmetrical connections to protect information. MP-S2S further enhances visual information from complementary perspectives, including both low-level pixels and high-level tokens. Experiments show that DF-VQGAN performs more robustness than VQGAN. To evaluate the inpainting performance of our model, we built up 3 open-domain benchmarks, where N\"UWA-LIP is also superior to recent strong baselines.

Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, Wangmeng Zuo, Nan Duan• 2022

Related benchmarks

TaskDatasetResultRank
Image ReconstructionImageNet1K (val)
FID1.38
83
Language Guided Image InpaintingMaskCOCO
FID10.5
5
Language Guided Image InpaintingMaskFlickr
FID42.5
4
Language Guided Image InpaintingMaskVG
FID8.5
4
Showing 4 of 4 rows

Other info

Follow for update