Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Transition Models: Rethinking the Generative Learning Objective

About

A fundamental dilemma in generative modeling persists: iterative diffusion models achieve outstanding fidelity, but at a significant computational cost, while efficient few-step alternatives are constrained by a hard quality ceiling. This conflict between generation steps and output quality arises from restrictive training objectives that focus exclusively on either infinitesimal dynamics (PF-ODEs) or direct endpoint prediction. We address this challenge by introducing an exact, continuous-time dynamics equation that analytically defines state transitions across any finite time interval. This leads to a novel generative paradigm, Transition Models (TiM), which adapt to arbitrary-step transitions, seamlessly traversing the generative trajectory from single leaps to fine-grained refinement with more steps. Despite having only 865M parameters, TiM achieves state-of-the-art performance, surpassing leading models such as SD3.5 (8B parameters) and FLUX.1 (12B parameters) across all evaluated step counts. Importantly, unlike previous few-step generators, TiM demonstrates monotonic quality improvement as the sampling budget increases. Additionally, when employing our native-resolution strategy, TiM delivers exceptional fidelity at resolutions up to 4096x4096.

Zidong Wang, Yiyuan Zhang, Xiaoyu Yue, Xiangyu Yue, Yangguang Li, Wanli Ouyang, Lei Bai• 2025

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet 256x256 (val)
FID3.26
427
Image GenerationImageNet 256x256--
359
Image GenerationImageNet 256x256 (val)
FID7.11
340
Class-conditional generationImageNet 256 x 256 1k (val)
IS210.3
102
Text-to-Image GenerationGenEval 1.0 (test)
Overall Score77.97
85
Image GenerationImageNet 256x256 (test)
FID3.26
54
Conditional Image GenerationImageNet 256px 2012 (val)
FID3.26
50
Showing 7 of 7 rows

Other info

Follow for update