Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

AnyDoor: Zero-shot Object-level Image Customization

About

This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations in a harmonious way. Instead of tuning parameters for each object, our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage. Such a challenging zero-shot setting requires an adequate characterization of a certain object. To this end, we complement the commonly used identity feature with detail features, which are carefully designed to maintain texture details yet allow versatile local variations (e.g., lighting, orientation, posture, etc.), supporting the object in favorably blending with different surroundings. We further propose to borrow knowledge from video datasets, where we can observe various forms (i.e., along the time axis) of a single object, leading to stronger model generalizability and robustness. Extensive experiments demonstrate the superiority of our approach over existing alternatives as well as its great potential in real-world applications, such as virtual try-on and object moving. Project page is https://damo-vilab.github.io/AnyDoor-Page/.

Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, Hengshuang Zhao• 2023

Related benchmarks

TaskDatasetResultRank
Image CaptioningFlickr30k (test)
CIDEr94.5
103
Virtual Try-OnVITON-HD (test)
SSIM79.6
48
Image CaptioningFlickr8k (test)
BLEU@436.4
27
Image CaptioningCOCO (test)--
27
Virtual Try-OnDressCode Dresses (unpaired and paired)
FIDu33.44
13
Virtual Try-OnDressCode Lower unpaired and paired
FID (Unpaired)23.338
13
Virtual Try-OnDressCode Upper (unpaired and paired)
FIDu25.432
13
Virtual Try-OnStreetTryOn Shop-to-Street
FID50.893
13
Visual Question AnsweringOK-VQA 2019
V-Score54.8
12
Virtual Try-OnStreetTryOn Model-to-Street
FID51.861
11
Showing 10 of 26 rows

Other info

Follow for update