Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FORA: Fast-Forward Caching in Diffusion Transformer Acceleration

About

Diffusion transformers (DiT) have become the de facto choice for generating high-quality images and videos, largely due to their scalability, which enables the construction of larger models for enhanced performance. However, the increased size of these models leads to higher inference costs, making them less attractive for real-time applications. We present Fast-FORward CAching (FORA), a simple yet effective approach designed to accelerate DiT by exploiting the repetitive nature of the diffusion process. FORA implements a caching mechanism that stores and reuses intermediate outputs from the attention and MLP layers across denoising steps, thereby reducing computational overhead. This approach does not require model retraining and seamlessly integrates with existing transformer-based diffusion models. Experiments show that FORA can speed up diffusion transformers several times over while only minimally affecting performance metrics such as the IS Score and FID. By enabling faster processing with minimal trade-offs in quality, FORA represents a significant advancement in deploying diffusion transformers for real-time applications. Code will be made publicly available at: https://github.com/prathebaselva/FORA.

Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, Luming Liang• 2024

Related benchmarks

TaskDatasetResultRank
Class-conditional Image GenerationImageNet
FID4.33
132
Class-conditional Image GenerationImageNet-1k (val)
FID3.88
68
Text-to-Image GenerationImageReward
ImageReward Score1.196
56
Text-to-Image GenerationFLUX.1 (dev)
Image Reward0.9776
56
Class-conditional Image GenerationImageNet (val)
FID2.8
54
Video GenerationVBench (test)
Semantic Score63.9
35
Class-conditional image synthesisImageNet
Inception Score243.8
25
Class-to-image generationImageNet
FID3.55
25
Text-to-Image GenerationFLUX.1-schnell 1.0 (dev)
Latency (s)5.09
23
Text-to-Video GenerationHunyuanVideo
LPIPS0.44
22
Showing 10 of 20 rows

Other info

Follow for update