FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

About

Text-guided image editing with diffusion models has achieved remarkable quality but often suffers from prohibitive latency. We introduce \textbf{FlashEdit}, a real-time localized image editing framework for the standard inversion-based editing setting. Its efficiency and precision stem from three key innovations: (1) a \textbf{Cycle-Consistent One-Step Inversion (COSI)} pipeline that encourages manifold-aligned one-step inversion through cycle consistency; (2) a \textbf{Background Shield (BG-Shield)} technique that improves preservation of non-edited regions via structural self-attention intervention; and (3) a \textbf{Sparsified Spatial Cross-Attention (SSCA)} mechanism that promotes precise edits by suppressing semantic leakage. Experiments on PIE-Bench demonstrate a strong preservation-efficiency trade-off, with edits completed in under 0.2 seconds and an over 150$\times$ speedup over DDIM-based multi-step editing. Our code will be made publicly available at \url{https://github.com/JunyiWuCode/FlashEdit}.

Junyi Wu, Zhiteng Li, Haotong Qin, Yulun Zhang, Xiaokang Yang• 2025

Related benchmarks

Task	Dataset	Result	Rank
Image Editing	PIE-Bench	PSNR25.33		257
Text-Guided Image Editing	General Image Editing	Speedup150.8		12

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord