Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Science-T2I: Addressing Scientific Illusions in Image Synthesis

About

Current image generation models produce visually compelling but scientifically implausible images, exposing a fundamental gap between visual fidelity and physical realism. In this work, we introduce ScienceT2I, an expert-annotated dataset comprising a training set of over 20k adversarial image pairs and 9k prompts across 16 scientific domains and an isolated test set of 454 challenging prompts. Using this benchmark, we evaluate 18 recent image generation models and find that none scores above 50 out of 100 under implicit scientific prompts, while explicit prompts that directly describe the intended outcome yield scores roughly 35 points higher, confirming that current models can render correct scenes when told what to depict but cannot reason from scientific cues to the correct visual outcome. To address this, we develop SciScore, a reward model fine-tuned from CLIP-H that captures fine-grained scientific phenomena without relying on language-guided inference, surpassing GPT-4o and experienced human evaluators by roughly 5 points. We further propose a two-stage alignment framework combining supervised fine-tuning with masked online fine-tuning to inject scientific knowledge into generative models. Applying this framework to FLUX.1[dev] yields a relative improvement exceeding 50% on SciScore, demonstrating that scientific reasoning in image generation can be substantially improved through targeted data and alignment.

Jialuo Li, Wenhao Chai, Xingyu Fu, Haiyang Xu, Saining Xie• 2025

Related benchmarks

TaskDatasetResultRank
Scientific Image SynthesisScience-T2I (test)--
18
Two-choice selectionScience-T2I Simple
Physics Accuracy94.92
11
Two-choice selectionScience-T2I Complex
Accuracy (Physics)86.89
11
Text-to-Image SynthesisScience-T2I C (test)
SciScore32.31
9
Text-to-Image SynthesisScience-T2I S (test)
SciScore30.95
9
Text-to-Image GenerationUser Study
User Study Score7.14
5
Showing 6 of 6 rows

Other info

Follow for update