Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

T2I-CompBench++: An Enhanced and Comprehensive Benchmark for Compositional Text-to-image Generation

About

Despite the impressive advances in text-to-image models, they often struggle to effectively compose complex scenes with multiple objects, displaying various attributes and relationships. To address this challenge, we present T2I-CompBench++, an enhanced benchmark for compositional text-to-image generation. T2I-CompBench++ comprises 8,000 compositional text prompts categorized into four primary groups: attribute binding, object relationships, generative numeracy, and complex compositions. These are further divided into eight sub-categories, including newly introduced ones like 3D-spatial relationships and numeracy. In addition to the benchmark, we propose enhanced evaluation metrics designed to assess these diverse compositional challenges. These include a detection-based metric tailored for evaluating 3D-spatial relationships and numeracy, and an analysis leveraging Multimodal Large Language Models (MLLMs), i.e. GPT-4V, ShareGPT4v as evaluation metrics. Our experiments benchmark 11 text-to-image models, including state-of-the-art models, such as FLUX.1, SD3, DALLE-3, Pixart-${\alpha}$, and SD-XL on T2I-CompBench++. We also conduct comprehensive evaluations to validate the effectiveness of our metrics and explore the potential and limitations of MLLMs.

Kaiyi Huang, Chengqi Duan, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu• 2023

Related benchmarks

TaskDatasetResultRank
Text-to-Image GenerationT2I-CompBench
Evaluation Time (min)141
12
Visual Generative Model EvaluationVisual Generative Evaluation Frameworks
Required Samples1.80e+4
3
Showing 2 of 2 rows

Other info

Follow for update