Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

About

3D content generation has recently attracted significant research interest, driven by its critical applications in VR/AR and embodied AI. In this work, we tackle the challenging task of synthesizing multiple 3D assets within a single scene image. Concretely, our contributions are fourfold: (i) we present SceneGen, a novel framework that takes a scene image and corresponding object masks as input, simultaneously producing multiple 3D assets with geometry and texture. Notably, SceneGen operates with no need for extra optimization or asset retrieval; (ii) we introduce a novel feature aggregation module that integrates local and global scene information from visual and geometric encoders within the feature extraction module. Coupled with a position head, this enables the generation of 3D assets and their relative spatial positions in a single feedforward pass; (iii) we demonstrate SceneGen's direct extensibility to multi-image input scenarios. Despite being trained solely on single-image inputs, our architecture yields improved generation performance when multiple images are provided; and (iv) extensive quantitative and qualitative evaluations confirm the efficiency and robustness of our approach. We believe this paradigm offers a novel solution for high-quality 3D content generation, potentially advancing its practical applications in downstream tasks. The code and model will be publicly available at: https://mengmouxu.github.io/SceneGen.

Yanxu Meng, Haoning Wu, Ya Zhang, Weidi Xie• 2025

Related benchmarks

TaskDatasetResultRank
3D Scene Generation3D-Front (test)
CD (Surface)0.1432
12
3D Scene GenerationBlendSwap & Scenethesis (test)
CD-S0.1161
5
Showing 2 of 2 rows

Other info

Follow for update