DreamO: A Unified Framework for Image Customization
About
Recently, extensive research on image customization (e.g., identity, subject, style, background, etc.) demonstrates strong customization capabilities in large-scale generative models. However, most approaches are designed for specific tasks, restricting their generalizability to combine different types of condition. Developing a unified framework for image customization remains an open challenge. In this paper, we present DreamO, an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions. Specifically, DreamO utilizes a diffusion transformer (DiT) framework to uniformly process input of different types. During training, we construct a large-scale training dataset that includes various customization tasks, and we introduce a feature routing constraint to facilitate the precise querying of relevant information from reference images. Additionally, we design a placeholder strategy that associates specific placeholders with conditions at particular positions, enabling control over the placement of conditions in the generated results. Moreover, we employ a progressive training strategy consisting of three stages: an initial stage focused on simple tasks with limited data to establish baseline consistency, a full-scale training stage to comprehensively enhance the customization capabilities, and a final quality alignment stage to correct quality biases introduced by low-quality data. Extensive experiments demonstrate that the proposed DreamO can effectively perform various image customization tasks with high quality and flexibly integrate different types of control conditions.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Single-image editing | GEdit EN (full) | BG Change3.06 | 42 | |
| Identity-preserving Image Generation | MultiID-Bench 1-people | Sim(GT)0.454 | 18 | |
| Reference-based multi-human generation | MultiHuman TestBench | Count61.2 | 14 | |
| Identity-Preserving Multi-subject Image Generation | LAMICBench++ Fewer Subjects | ITC90.14 | 12 | |
| Identity-Preserving Multi-subject Image Generation | LAMICBench++ More Subjects | ITC78.49 | 12 | |
| model-free try-on | Omni-TryOn (test) | DINO-I40.92 | 11 | |
| Multi-Human Image Generation | DiverseHumans TestPrompts (2-7 People) | Count Accuracy70.5 | 11 | |
| Identity-preserving Image Generation | MultiID-Bench 2-people | Sim(GT)0.359 | 10 | |
| Multi-Human Image Generation | MultiHuman TestBench (1-5 People) | Count Accuracy79.1 | 10 | |
| try-off | Omni-TryOn | CLIP-I87.55 | 10 |