Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

First Frame Is the Place to Go for Video Content Customization

About

What role does the first frame play in video generation models? Traditionally, it's viewed as the spatial-temporal starting point of a video, merely a seed for subsequent animation. In this work, we reveal a fundamentally different perspective: video models implicitly treat the first frame as a conceptual memory buffer that stores visual entities for later reuse during generation. Leveraging this insight, we show that it's possible to achieve robust and generalized video content customization in diverse scenarios, using only 20-50 training examples without architectural changes or large-scale finetuning. This unveils a powerful, overlooked capability of video generation models for reference-based video customization.

Jingxi Chen, Zongxia Li, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermuller, Brandon Y. Feng, Yiannis Aloimonos• 2025

Related benchmarks

TaskDatasetResultRank
Image-to-Video GenerationVBench
Motion Smoothness0.98
12
Video Content CustomizationUser Study (test)
Overall Quality4.28
4
Showing 2 of 2 rows

Other info

GitHub

Follow for update