Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tuning-free Visual Effect Transfer across Videos

About

We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While existing methods excel at prompt-based or keyframe-conditioned editing, they struggle with dynamic temporal effects such as dynamic lighting changes or character transformations, which are difficult to describe via text or static conditions. Transferring a video effect is challenging, as the model must integrate the new temporal dynamics with the input video's existing motion and appearance. % To address this, we introduce a large-scale dataset of triplets, where each triplet consists of a reference effect video, an input image or video, and a corresponding output video depicting the transferred effect. Creating this data is non-trivial, especially the video-to-video effect triplets, which do not exist naturally. To generate these, we propose a scalable automated pipeline that creates high-quality paired videos designed to preserve the input's motion and structure while transforming it based on some fixed, repeatable effect. We then augment this data with image-to-video effects derived from LoRA adapters and code-based temporal effects generated through programmatic composition. Building on our new dataset, we train our reference-conditioned model using recent text-to-video backbones. Experimental results demonstrate that RefVFX produces visually consistent and temporally coherent edits, generalizes across unseen effect categories, and outperforms prompt-only baselines in both quantitative metrics and human preference. See our website at https://snap-research.github.io/RefVFX/

Maxwell Jones, Rameen Abdal, Or Patashnik, Ruslan Salakhutdinov, Sergey Tulyakov, Jun-Yan Zhu, Kuan-Chieh Jackson Wang• 2026

Related benchmarks

TaskDatasetResultRank
Video-to-Video GenerationNeural V2V
AES0.5649
5
Image-to-Video GenerationI2V
AES56.07
4
Reference-based Video Effect TransferNeural V2V
Input Similarity0.8568
4
Video-to-Video GenerationCode Based V2V
AES0.4802
4
Image-to-Video GenerationImage-to-Video (I2V) unseen LoRA effects (val)
Ref Video Adherence (Win Rate)81.5
3
Reference-based Video Effect TransferCode Based V2V
Input Similarity94.79
3
Image-to-VideoRefVFX
First Frame Similarity0.7698
3
Showing 7 of 7 rows

Other info

Follow for update