FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting

About

Text-guided image inpainting endeavors to generate new content within specified regions of images using textual prompts from users. The primary challenge is to accurately align the inpainted areas with the user-provided prompts while maintaining a high degree of visual fidelity. While existing inpainting methods have produced visually convincing results by leveraging the pre-trained text-to-image diffusion models, they still struggle to uphold both prompt alignment and visual rationality simultaneously. In this work, we introduce FreeInpaint, a plug-and-play tuning-free approach that directly optimizes the diffusion latents on the fly during inference to improve the faithfulness of the generated images. Technically, we introduce a prior-guided noise optimization method that steers model attention towards valid inpainting regions by optimizing the initial noise. Furthermore, we meticulously design a composite guidance objective tailored specifically for the inpainting task. This objective efficiently directs the denoising process, enhancing prompt alignment and visual rationality by optimizing intermediate latents at each step. Through extensive experiments involving various inpainting diffusion models and evaluation metrics, we demonstrate the effectiveness and robustness of our proposed FreeInpaint.

Chao Gong, Dong Li, Yingwei Pan, Jingjing Chen, Ting Yao, Tao Mei• 2025

Related benchmarks

Task	Dataset	Result	Rank
Image Inpainting	EditBench free-form masks (val)	ImageReward0.5248		15
Text-guided image inpainting	MSCOCO with layout masks (test)	ImageReward0.3422		15

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord