Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization

About

This paper introduces Story-Iter, a new training-free iterative paradigm to enhance long-story generation. Unlike existing methods that rely on fixed reference images to construct a complete story, our approach features a novel external iterative paradigm, extending beyond the internal iterative denoising steps of diffusion models, to continuously refine each generated image by incorporating all reference images from the previous round. To achieve this, we propose a plug-and-play, training-free global reference cross-attention (GRCA) module, modeling all reference frames with global embeddings, ensuring semantic consistency in long sequences. By progressively incorporating holistic visual context and text constraints, our iterative paradigm enables precise generation with fine-grained interactions, optimizing the story visualization step-by-step. Extensive experiments in the official story visualization dataset and our long story benchmark demonstrate that Story-Iter's state-of-the-art performance in long-story visualization (up to 100 frames) excels in both semantic consistency and fine-grained interactions.

Jiawei Mao, Xiaoke Huang, Yunfei Xie, Yuanqi Chang, Mude Hui, Bingjie Xu, Zeyu Zheng, Zirui Wang, Cihang Xie, Yuyin Zhou• 2024

Related benchmarks

TaskDatasetResultRank
Cinematic Story GenerationViStoryBench
CSD (Cross)0.325
24
Visual StorytellingViStoryBench Lite 2025
CSD (Cross)0.518
21
Story VisualizationStorySalon long stories (test)
CLIP-T0.318
13
Story VisualizationStorySalon regular-length (test)
CLIP-T0.31
10
Video GenerationFilMaster evaluation suite
Script Faithfulness (SF)3.75
9
Regular-Length Story VisualizationStoryGen Regular-Length Story Visualization (Human Evaluation)
Alignment4.06
8
Long Story VisualizationStoryGen Human Evaluation Set Long Story Visualization
Alignment4.35
7
Subject-consistent image generationStoryGen Human Evaluation Set Subject-Consistent Image Generation
Alignment4.2
6
Subject-consistent image generationSubject-consistent image generation benchmark (test)
CLIP-T Score0.332
6
Story GenerationViStoryBench 2025
CSD (Style) Cross45.6
5
Showing 10 of 19 rows

Other info

Follow for update