Pose-Aware Diffusion for 3D Generation

About

Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.

Zihan Zhou, Luxi Chen, Jingzhi Zhou, Yuhao Wan, Min Zhao, Baoyu Fan, Chongxuan Li• 2026

Related benchmarks

Task	Dataset	Result	Rank
Posed Object Generation	Google Scanned Objects (GSO) (test)	CD (Chamfer Distance)48.76		7
Compositional 3D Scene Generation	3D-FUTURE first 200 scenes (test)	Chamfer Distance (CD)5.62e+4		4

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord