LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

About

Recent advances in text-to-image customization have enabled high-fidelity, context-rich generation of personalized images, allowing specific concepts to appear in a variety of scenarios. However, current methods struggle with combining multiple personalized models, often leading to attribute entanglement or requiring separate training to preserve concept distinctiveness. We present LoRACLR, a novel approach for multi-concept image generation that merges multiple LoRA models, each fine-tuned for a distinct concept, into a single, unified model without additional individual fine-tuning. LoRACLR uses a contrastive objective to align and merge the weight spaces of these models, ensuring compatibility while minimizing interference. By enforcing distinct yet cohesive representations for each concept, LoRACLR enables efficient, scalable model composition for high-quality, multi-concept image synthesis. Our results highlight the effectiveness of LoRACLR in accurately merging multiple concepts, advancing the capabilities of personalized image generation.

Enis Simsar, Thomas Hofmann, Federico Tombari, Pinar Yanardag• 2024

Related benchmarks

Task	Dataset	Result
Multi-Concept Image Generation	12-concept dataset	Text Alignment0.668	26
Text-to-Image Personalization	Concepts dataset	CLIP-I Score0.674	14
Multi-concept Generation	32 concepts	DINO0.434	5
Multi-Concept Image Generation	User Study	Identity Alignment3.42	4
Multi-Concept Image Generation	Multi-concept generation evaluation set	Accuracy (Avg)72.4	4

Showing 5 of 5 rows

Other info

Code

Follow for update

@wizwand_team Discord