Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Real-Time Video Generation with Pyramid Attention Broadcast

About

We present Pyramid Attention Broadcast (PAB), a real-time, high quality and training-free approach for DiT-based video generation. Our method is founded on the observation that attention difference in the diffusion process exhibits a U-shaped pattern, indicating significant redundancy. We mitigate this by broadcasting attention outputs to subsequent steps in a pyramid style. It applies different broadcast strategies to each attention based on their variance for best efficiency. We further introduce broadcast sequence parallel for more efficient distributed inference. PAB demonstrates up to 10.5x speedup across three models compared to baselines, achieving real-time generation for up to 720p videos. We anticipate that our simple yet effective method will serve as a robust baseline and facilitate future research and application for video generation.

Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang You• 2024

Related benchmarks

TaskDatasetResultRank
Video GenerationVBench (test)
Semantic Score56.3
48
Video GenerationOpen-Sora 204 frames, 480P
Latency19.8
16
Video GenerationOpen-Sora-Plan 221 frames, 512×512
Latency34.9
16
Video GenerationLatte 16 frames, 512×512
Latency7.9
16
Video GenerationVBench CogVideoX v1.5
Speedup1.41
12
Text-to-Video GenerationHunyuanVideo 544p × 860p, 17 frames
VBench Score80.81
9
Text-to-Video GenerationVBench
VBench Score76.95
9
Video GenerationLatte 16 frames, 512×512
VBench76.32
8
Video GenerationOpen-Sora-Plan 65 frames, 512×512
VBench80.3
7
Video GenerationOpen-Sora 51 frames, 480P
VBench78.1
7
Showing 10 of 20 rows

Other info

Follow for update