Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation

About

Recently, GPT-4o has garnered significant attention for its strong performance in image generation, yet open-source models still lag behind. Several studies have explored distilling image data from GPT-4o to enhance open-source models, achieving notable progress. However, a key question remains: given that real-world image datasets already constitute a natural source of high-quality data, why should we use GPT-4o-generated synthetic data? In this work, we identify two key advantages of synthetic images. First, they can complement rare scenarios in real-world datasets, such as surreal fantasy or multi-reference image generation, which frequently occur in user queries. Second, they provide clean and controllable supervision. Real-world data often contains complex background noise and inherent misalignment between text descriptions and image content, whereas synthetic images offer pure backgrounds and long-tailed supervision signals, facilitating more accurate text-to-image alignment. Building on these insights, we introduce Echo-4o-Image, a 180K-scale synthetic dataset generated by GPT-4o, harnessing the power of synthetic image data to address blind spots in real-world coverage. Using this dataset, we fine-tune the unified multimodal generation baseline Bagel to obtain Echo-4o. In addition, we propose two new evaluation benchmarks for a more accurate and challenging assessment of image generation capabilities: GenEval++, which increases instruction complexity to mitigate score saturation, and Imagine-Bench, which focuses on evaluating both the understanding and generation of imaginative content. Echo-4o demonstrates strong performance across standard benchmarks. Moreover, applying Echo-4o-Image to other foundation models (e.g., OmniGen2, BLIP3-o) yields consistent performance gains across multiple metrics, highlighting the datasets strong transferability.

Junyan Ye, Dongzhi Jiang, Zihao Wang, Leqi Zhu, Zhenghao Hu, Zilong Huang, Jun He, Zhiyuan Yan, Jinghua Yu, Hongsheng Li, Conghui He, Weijia Li• 2025

Related benchmarks

TaskDatasetResultRank
Multi-image ReasoningOmniContext
Single Scene Char Score8.62
20
Image GenerationMind-Bench
SE Score0.04
18
Subject-driven image generationSconeEval
Composition Single COM8.58
11
Multi-reference image generationOmniContext MULTIPLE 1.0 (test)
Character Score8.07
10
Multi-reference image generationOmniContext SCENE 1.0 (test)
Character Fidelity Score8.62
10
Instruction-following generationGenEval++ (test)
Color Accuracy80
9
Image EditingDreamOmni2Bench Editing - Add
PF Score8.36
6
Image EditingDreamOmni2Bench Editing - Replace
PF Score3.85
6
Image GenerationDreamOmni2Bench Generation
PF6.68
6
Image EditingDreamOmni2Bench Editing - Global
PF Score4.38
6
Showing 10 of 11 rows

Other info

Follow for update