Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

FlashWorld: High-quality 3D Scene Generation within Seconds

About

We propose FlashWorld, a generative model that produces 3D scenes from a single image or text prompt in seconds, 10~100$\times$ faster than previous works while possessing superior rendering quality. Our approach shifts from the conventional multi-view-oriented (MV-oriented) paradigm, which generates multi-view images for subsequent 3D reconstruction, to a 3D-oriented approach where the model directly produces 3D Gaussian representations during multi-view generation. While ensuring 3D consistency, 3D-oriented method typically suffers poor visual quality. FlashWorld includes a dual-mode pre-training phase followed by a cross-mode post-training phase, effectively integrating the strengths of both paradigms. Specifically, leveraging the prior from a video diffusion model, we first pre-train a dual-mode multi-view diffusion model, which jointly supports MV-oriented and 3D-oriented generation modes. To bridge the quality gap in 3D-oriented generation, we further propose a cross-mode post-training distillation by matching distribution from consistent 3D-oriented mode to high-quality MV-oriented mode. This not only enhances visual quality while maintaining 3D consistency, but also reduces the required denoising steps for inference. Also, we propose a strategy to leverage massive single-view images and text prompts during this process to enhance the model's generalization to out-of-distribution inputs. Extensive experiments demonstrate the superiority and efficiency of our method.

Xinyang Li, Tengfei Wang, Zixiao Gu, Shengchuan Zhang, Chunchao Guo, Liujuan Cao• 2025

Related benchmarks

TaskDatasetResultRank
Video ReconstructionDAVIS
PSNR16.72
29
Video GenerationGeneral Scenes Image-to-Video
PSNR22.46
8
1-view-based novel view generationRealEstate10K
PSNR20.18
7
1-view-based novel view generationDL3DV-10K
PSNR16.02
7
Single-image world generationWorldScore Indoor
3D Consistency83.57
7
Single-image world generationDL3DV
3D Consistency76.74
7
4D Camera ControlPREBench Camera-only
Camera Rotation Error6.3761
7
Camera-only motion controlVerseControl4D static
Overall Score81.8
4
Showing 8 of 8 rows

Other info

Follow for update