Tuning-free Visual Effect Transfer across Videos

About

We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While existing methods excel at prompt-based or keyframe-conditioned editing, they struggle with dynamic temporal effects such as dynamic lighting changes or character transformations, which are difficult to describe via text or static conditions. Transferring a video effect is challenging, as the model must integrate the new temporal dynamics with the input video's existing motion and appearance. % To address this, we introduce a large-scale dataset of triplets, where each triplet consists of a reference effect video, an input image or video, and a corresponding output video depicting the transferred effect. Creating this data is non-trivial, especially the video-to-video effect triplets, which do not exist naturally. To generate these, we propose a scalable automated pipeline that creates high-quality paired videos designed to preserve the input's motion and structure while transforming it based on some fixed, repeatable effect. We then augment this data with image-to-video effects derived from LoRA adapters and code-based temporal effects generated through programmatic composition. Building on our new dataset, we train our reference-conditioned model using recent text-to-video backbones. Experimental results demonstrate that RefVFX produces visually consistent and temporally coherent edits, generalizes across unseen effect categories, and outperforms prompt-only baselines in both quantitative metrics and human preference. See our website at https://snap-research.github.io/RefVFX/

Maxwell Jones, Rameen Abdal, Or Patashnik, Ruslan Salakhutdinov, Sergey Tulyakov, Jun-Yan Zhu, Kuan-Chieh Jackson Wang• 2026

Related benchmarks

Task	Dataset	Result
Video-to-Video Generation	Neural V2V	AES0.5649	5
Image-to-Video Generation	I2V	AES56.07	4
Reference-based Video Effect Transfer	Neural V2V	Input Similarity0.8568	4
Video-to-Video Generation	Code Based V2V	AES0.4802	4
Image-to-Video Generation	Image-to-Video (I2V) unseen LoRA effects (val)	Ref Video Adherence (Win Rate)81.5	3
Reference-based Video Effect Transfer	Code Based V2V	Input Similarity94.79	3
Image-to-Video	RefVFX	First Frame Similarity0.7698	3

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord