Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PanoVGGT: Feed-Forward 3D Reconstruction from Panoramic Imagery

About

Panoramic imagery offers a full 360{\deg} field of view and is increasingly common in consumer devices. However, it introduces non-pinhole distortions that challenge joint pose estimation and 3D reconstruction. Existing feed-forward models, built for perspective cameras, generalize poorly to this setting. We propose PanoVGGT, a permutation-equivariant Transformer framework that jointly predicts camera poses, depth maps, and 3D point clouds from one or multiple panoramas in a single forward pass. The model incorporates spherical-aware positional embeddings and a panorama-specific three-axis SO(3) rotation augmentation, enabling effective geometric reasoning in the spherical domain. To resolve inherent global-frame ambiguity, we further introduce a stochastic anchoring strategy during training. In addition, we contribute PanoCity, a large-scale outdoor panoramic dataset with dense depth and 6-DoF pose annotations. Extensive experiments on PanoCity and standard benchmarks demonstrate that PanoVGGT achieves competitive accuracy, strong robustness, and improved cross-domain generalization. Code and dataset will be released.

Yijing Guo, Mengjun Chao, Luo Wang, Tianyang Zhao, Haizhao Dai, Yingliang Zhang, Jingyi Yu, Yujiao Shi• 2026

Related benchmarks

TaskDatasetResultRank
Depth EstimationMatterport3D
delta192.66
50
Depth EstimationStanford2D3D
Abs Rel0.0711
27
Depth EstimationPano3D GibsonV2
Absolute Relative Error0.0833
24
Depth EstimationPanoCity Outdoor
Abs Rel0.0196
12
Depth EstimationStructured3D Indoor
Abs Rel Error4
12
Camera pose estimationMatterport3D Indoor
AUC@3045.9
5
Camera pose estimationStanford2D3D Indoor
AUC@3055.6
5
Camera pose estimationPanoCity Outdoor
AUC@300.949
5
Point Cloud ReconstructionPanoCity
Accuracy Mean0.768
5
Point Cloud ReconstructionStanford2D3D v1 (test)
Accuracy Mean21.09
5
Showing 10 of 12 rows

Other info

Follow for update