AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
About
Instruction-based image editing aims to modify specific image elements with natural language instructions. However, current models in this domain often struggle to accurately execute complex user instructions, as they are trained on low-quality data with limited editing types. We present AnyEdit, a comprehensive multi-modal instruction editing dataset, comprising 2.5 million high-quality editing pairs spanning over 20 editing types and five domains. We ensure the diversity and quality of the AnyEdit collection through three aspects: initial data diversity, adaptive editing process, and automated selection of editing results. Using the dataset, we further train a novel AnyEdit Stable Diffusion with task-aware routing and learnable task embedding for unified image editing. Comprehensive experiments on three benchmark datasets show that AnyEdit consistently boosts the performance of diffusion-based editing models. This presents prospects for developing instruction-driven image editing models that support human creativity.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Editing | ImgEdit-Bench | Overall Score2.45 | 132 | |
| Image Editing | GEdit-Bench English | G_O (Overall Quality)3.21 | 73 | |
| Image Editing | KRIS-Bench | Factual Knowledge Score39.26 | 65 | |
| Instructive image editing | EMU Edit (test) | CLIP Image Similarity0.872 | 46 | |
| Image Editing | GEdit-Bench | Semantic Consistency3.18 | 46 | |
| Instruction-based Image Editing | ImgEdit Bench 1.0 (test) | Add Score3.18 | 37 | |
| Image Editing | AnyEdit (test) | CLIP Score (Input)0.867 | 28 | |
| Image Editing | GEdit-Bench-EN v1.0 (Full set) | G Score (SC)3.053 | 22 | |
| Instructive image editing | MagicBrush (test) | CLIP Image0.898 | 20 | |
| Instruction-guided image editing | GEdit-Bench EN Full set | G_SC3.18 | 20 |