Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EditCLIP: Representation Learning for Image Editing

About

We introduce EditCLIP, a novel representation-learning approach for image editing. Our method learns a unified representation of edits by jointly encoding an input image and its edited counterpart, effectively capturing their transformation. To evaluate its effectiveness, we employ EditCLIP to solve two tasks: exemplar-based image editing and automated edit evaluation. In exemplar-based image editing, we replace text-based instructions in InstructPix2Pix with EditCLIP embeddings computed from a reference exemplar image pair. Experiments demonstrate that our approach outperforms state-of-the-art methods while being more efficient and versatile. For automated evaluation, EditCLIP assesses image edits by measuring the similarity between the EditCLIP embedding of a given image pair and either a textual editing instruction or the EditCLIP embedding of another reference image pair. Experiments show that EditCLIP aligns more closely with human judgments than existing CLIP-based metrics, providing a reliable measure of edit quality and structural preservation.

Qian Wang, Aleksandar Cvejic, Abdelrahman Eldesokey, Peter Wonka• 2025

Related benchmarks

TaskDatasetResultRank
DehazingSOTS--
154
Super-ResolutionFFHQ 1k
FID77.64
23
Image DenoisingBSD400 (test)
FID99
16
Image ColorizationDIV2K
FID138.3
16
Image DeblurringFFHQ 1k
FID78.75
16
Image DerainingRain100L
FID174.9
13
Super-ResolutionUser Study SR samples
Perceptual Score0.00e+0
5
Image DerainingUser Study DeRain samples
Perceptual Score4.5
4
Showing 8 of 8 rows

Other info

Follow for update