VGMShield: Mitigating Misuse of Video Generative Models

About

With the rapid advancement in video generation, people can conveniently use video generation models to create videos tailored to their specific desires. As a result, there are also growing concerns about the potential misuse of video generation for spreading illegal content and misinformation. In this work, we introduce VGMShield: a set of straightforward but effective mitigations through the lifecycle of fake video generation. We start from fake video detection, trying to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos; then, we investigate the fake video source tracing problem, which maps a fake video back to the model that generated it. Towards these, we propose to leverage pre-trained models that focus on spatial-temporal dynamics as the backbone to identify inconsistencies in videos. In detail, we analyze fake videos from the perspective of the generation process. Based on the observation of attention shifts, motion variations, and frequency fluctuations, we identify common patterns in the generated video. These patterns serve as the foundation for our experiments on fake video detection and source tracing. Through experiments on seven state-of-the-art open-source models, we demonstrate that current models still cannot reliably reproduce spatial-temporal relationships, and thus, we can accomplish detection and source tracing with over 90% accuracy. Furthermore, anticipating future generative model improvements, we propose a prevention method that adds invisible perturbations to the query images to make the generated videos look unreal. Together with detection and tracing, our multi-faceted set of solutions can effectively mitigate misuse of video generative models.

Yan Pang, Baicheng Chen, Yang Zhang, Tianhao Wang• 2024

Related benchmarks

Task	Dataset	Result
Image-to-Video Generation	CelebV-Text	ISM55.4	21
Image-to-Video Generation	UCF101	ISM36.1	21
Identity Protection for Image-to-Video Generation	Wan TI2V 5B 2.2	ISM0.628	7
Image-to-Video Protection	CelebV-Text DynamiCrafter (200 videos)	ISM0.322	6
Image-to-Video Protection	CelebV-Text evaluated on OpenSora 1.2 200 videos	ISM0.566	6

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord