Allegro: Open the Black Box of Commercial-Level Video Generation Model
About
Significant advancements have been made in the field of video generation, with the open-source community contributing a wealth of research papers and tools for training high-quality models. However, despite these efforts, the available information and resources remain insufficient for achieving commercial-level performance. In this report, we open the black box and introduce $\textbf{Allegro}$, an advanced video generation model that excels in both quality and temporal consistency. We also highlight the current limitations in the field and present a comprehensive methodology for training high-performance, commercial-level video generation models, addressing key aspects such as data, model architecture, training pipeline, and evaluation. Our user study shows that Allegro surpasses existing open-source models and most commercial models, ranking just behind Hailuo and Kling. Code: https://github.com/rhymes-ai/Allegro , Model: https://huggingface.co/rhymes-ai/Allegro , Gallery: https://rhymes.ai/allegro_gallery .
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Text-to-Video Generation | VBench | Quality Score83.1 | 111 | |
| Video Generation | UCF-101 (test) | Inception Score67.16 | 105 | |
| Video Reconstruction | WebVid 10M | PSNR32.18 | 34 | |
| 3D Scene Generation | WorldScore | Camera Control24.84 | 33 | |
| Video Generation | SkyTimelapse (test) | FVD16117.3 | 16 | |
| Video Generation | WorldScore (test) | Average Score53.64 | 12 | |
| Video Reconstruction | Panda-70M | PSNR31.7 | 10 |