TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation

About

Most of these text-to-video (T2V) generative models often produce single-scene video clips that depict an entity performing a particular action (e.g., 'a red panda climbing a tree'). However, it is pertinent to generate multi-scene videos since they are ubiquitous in the real-world (e.g., 'a red panda climbing a tree' followed by 'the red panda sleeps on the top of the tree'). To generate multi-scene videos from the pretrained T2V model, we introduce a simple and effective Time-Aligned Captions (TALC) framework. Specifically, we enhance the text-conditioning mechanism in the T2V architecture to recognize the temporal alignment between the video scenes and scene descriptions. For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e.g., 'a red panda climbing a tree') and second scene description (e.g., 'the red panda sleeps on the top of the tree'), respectively. As a result, we show that the T2V model can generate multi-scene videos that adhere to the multi-scene text descriptions and be visually consistent (e.g., entity and background). Further, we finetune the pretrained T2V model with multi-scene video-text data using the TALC framework. We show that the TALC-finetuned model outperforms the baseline by achieving a relative gain of 29% in the overall score, which averages visual consistency and text adherence using human evaluation.

Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang• 2024

Related benchmarks

Task	Dataset	Result
Text-to-Video Generation	VBench	Quality Score62.5	209
Video Generation	User Study (test)	Video Quality Score12.31	8
Multi-scene video generation	Multi-scene evaluation dataset 1.0 (test)	Visual Consistency67.47	5
Auto-regressive scene extension	T2V-CompBench	Action Binding Score2.15	5
Auto-regressive scene extension	EvalCrafter	VQA_A3.72	5

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord