Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models

About

Autoregressive video models are promising for world modeling via next-frame prediction, but they suffer from exposure bias: a mismatch between training on clean contexts and inference on self-generated frames, causing errors to compound and quality to drift over time. We introduce Backwards Aggregation (BAgger), a self-supervised scheme that constructs corrective trajectories from the model's own rollouts, teaching it to recover from its mistakes. Unlike prior approaches that rely on few-step distillation and distribution-matching losses, which can hurt quality and diversity, BAgger trains with standard score or flow matching objectives, avoiding large teachers and long-chain backpropagation through time. We instantiate BAgger on causal diffusion transformers and evaluate on text-to-video, video extension, and multi-prompt generation, observing more stable long-horizon motion and better visual consistency with reduced drift.

Ryan Po, Eric Ryan Chan, Changan Chen, Gordon Wetzstein• 2025

Related benchmarks

TaskDatasetResultRank
Text-to-Video GenerationVBench
Subject Consistency84.05
6
Long text-to-video generationVBench 50s long videos
Motion Quality87.59
4
Text-to-VideoMiraData Pexels
Smoothness98.61
4
Showing 3 of 3 rows

Other info

Follow for update