ProPainter: Improving Propagation and Transformer for Video Inpainting

About

Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably. We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens. With these components, ProPainter outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining appealing efficiency.

Shangchen Zhou, Chongyi Li, Kelvin C.K. Chan, Chen Change Loy• 2023

Related benchmarks

Task	Dataset	Result
Video Reconstruction	DAVIS	PSNR24.13	33
Video Object Removal	Real-World Videos	Temporal Consistency Score3.89	25
Video Object Removal	Scene-Bench	Removal Completeness3.9091	16
Video Object Removal	DAVIS (test)	Motion Smoothness0.9748	14
Video Inpainting	HQVI	PSNR30.69	13
Background layer reconstruction	Synthetic Movie scenes OmnimatteRF benchmark (test)	PSNR31.06	13
Video Object Removal	ROSE Bench	LPIPS0.1281	13
Video Inpainting	YouTube-VOS 2018 (test)	PSNR34.43	10
Video Inpainting	DAVIS 2017 (test)	PSNR34.47	10
Video Object Removal	DAVIS	TokSim28.24	10

Showing 10 of 51 rows

Other info

Code

Follow for update

@wizwand_team Discord