ToLL: Topological Layout Learning with Asymmetric Cross-View Structural Distillation for 3D Scene Graph Generation Pretraining

About

3D Scene Graph (3DSG) generation plays a pivotal role in spatial understanding and affordance perception. To mitigate generalization issues from data scarcity, joint-embedding and generative proxy tasks are proposed to pre-train 3DSG representations on predicate label-free datasets. Currently, generative pre-training usually bypasses the semantic corruption caused by the geometric augmentations in joint-embedding, but cannot avoid a negative problem ``Geometric Shortcut." In this problem, exposing dense object spatial and scale priors will induce models to trivially reconstruct scenes by interpolating object positions, rather than learning the underlying topological constraints provided by edges. To address this issue, we propose a Topological Layout Learning (ToLL) for 3DSG generation pretraining framework. In detail, we design an Anchor-Conditioned Topological Geometry Reasoning. It adopts a recurrent GNN to recover the global layout of zero-centered subgraphs (the non-visible spatial features) by one anchor with sparse spatial prior. Considering the absence of spatial layout information within the objects, it creates an information bottleneck, compelling our model to recover the full scene layout by leveraging predicate representation learning. Moreover, we construct a Structural Multi-view Augmentation to avoid semantic corruption, enhancing 3DSG representations via self-distillation. The extensive experiments on special dataset demonstrate that our ToLL could often improve 3DSG pertaining quality, outperforming state-of-the-art baselines.

Yucheng Huang, Luping Ji, Xiangwei Jiang, Wen Li, Mao Ye• 2026

Related benchmarks

Task	Dataset	Result
Predicate Classification (PredCls)	3DSSG	mR@5067.6	35
Scene Graph Classification (SGCls)	3DSSG	mR@200.357	22
Triplet Prediction	3DSSG	mA@5066.58	15

Showing 3 of 3 rows

Other info

Follow for update

@wizwand_team Discord