Pose-Aware Diffusion for 3D Generation
About
Generating pose-aligned 3D objects is challenging due to the spatial mismatches and transformation ambiguities inherent in decoupled canonical-then-rotate paradigms. To this end, we introduce Pose-Aware Diffusion (PAD), a novel end-to-end diffusion framework that synthesizes 3D geometry directly within the observation space. By unprojecting monocular depth into a partial point cloud and explicitly injecting it as a 3D geometric anchor, PAD abandons canonical assumptions to enforce rigorous spatial supervision. This native generation intrinsically resolves pose ambiguity, producing high-fidelity pose-aligned assets. Extensive experiments demonstrate that PAD achieves superior geometric alignment and image-to-3D correspondence compared to state-of-the-art methods. Additionally, PAD naturally extends to compositional 3D scene reconstruction via a simple union of independently generated objects, highlighting its robust ability to preserve precise spatial layouts.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Posed Object Generation | Google Scanned Objects (GSO) (test) | CD (Chamfer Distance)48.76 | 7 | |
| Compositional 3D Scene Generation | 3D-FUTURE first 200 scenes (test) | Chamfer Distance (CD)5.62e+4 | 4 |