Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Target-Aware Video Diffusion Models

About

We present a target-aware video diffusion model that generates videos from an input image, in which an actor interacts with a specified target while performing a desired action. The target is defined by a segmentation mask, and the action is described through a text prompt. Our key motivation is to incorporate target awareness into video generation, enabling actors to perform directed actions on designated objects. This enables video diffusion models to act as motion planners, producing plausible predictions of human-object interactions by leveraging the priors of large-scale video generative models. We build our target-aware model by extending a baseline model to incorporate the target mask as an additional input. To enforce target awareness, we introduce a special token that encodes the target's spatial information within the text prompt. We then fine-tune the model with our curated dataset using an additional cross-attention loss that aligns the cross-attention maps associated with this token with the input target mask. To further improve performance, we selectively apply this loss to the most semantically relevant attention regions and transformer blocks. Experimental results show that our target-aware model outperforms existing solutions in generating videos where actors interact accurately with the specified targets. We further demonstrate its efficacy in two downstream applications: zero-shot 3D HOI motion synthesis with physical plausibility and long-term video content creation.

Taeksoo Kim, Hanbyul Joo• 2025

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench--
126
Image-to-Video GenerationVBench
Motion Smoothness0.991
12
Image-to-Video GenerationInterGenEval (Synthetic (60 pairs) and Real (58 pairs))
KISA0.465
5
Controllable Text-to-Video GenerationTargeting Evaluation Dataset
Contact Score0.878
4
Target-Oriented Video Generation80 Interaction Scenes
Contact Score0.878
4
Showing 5 of 5 rows

Other info

Follow for update