Visual Implicit Geometry Transformer for Autonomous Driving

About

We introduce the Visual Implicit Geometry Transformer (ViGT), an autonomous driving geometric model that estimates continuous 3D occupancy fields from surround-view camera rigs. ViGT represents a step towards foundational geometric models for autonomous driving, prioritizing scalability, architectural simplicity, and generalization across diverse sensor configurations. Our approach achieves this through a calibration-free architecture, enabling a single model to adapt to different sensor setups. Unlike general-purpose geometric foundational models that focus on pixel-aligned predictions, ViGT estimates a continuous 3D occupancy field in a birds-eye-view (BEV) addressing domain-specific requirements. ViGT naturally infers geometry from multiple camera views into a single metric coordinate frame, providing a common representation for multiple geometric tasks. Unlike most existing occupancy models, we adopt a self-supervised training procedure that leverages synchronized image-LiDAR pairs, eliminating the need for costly manual annotations. We validate the scalability and generalizability of our approach by training our model on a mixture of five large-scale autonomous driving datasets (NuScenes, Waymo, NuPlan, ONCE, and Argoverse) and achieving state-of-the-art performance on the pointmap estimation task, with the best average rank across all evaluated baselines. We further evaluate ViGT on the Occ3D-nuScenes benchmark, where ViGT achieves comparable performance with supervised methods. The source code is publicly available at \href{https://github.com/whesense/ViGT}{https://github.com/whesense/ViGT}.

Arsenii Shirokov, Mikhail Kuznetsov, Danila Stepochkin, Egor Evdokimov, Daniil Glazkov, Nikolay Patakin, Anton Konushin, Dmitry Senushkin• 2026

Related benchmarks

Task	Dataset	Result
3D Occupancy Prediction	Occ3D-nuScenes (val)	--	213
Pointmap Estimation	nuScenes (test)	AbsRel0.068	15
Pointmap Estimation	Argoverse 2 (AV2) (test)	AbsRel0.131	15
Pointmap Estimation	ONCE (test)	AbsRel0.169	15
Pointmap Estimation	NuPlan subsampled (test)	AbsRel0.118	15
Pointmap Estimation	Waymo (test)	AbsRel0.121	15
Pointmap Estimation	Aggregate (NuScenes, AV2, Waymo, ONCE, NuPlan)	Average Rank1.8	9

Showing 7 of 7 rows

Other info

Follow for update

@wizwand_team Discord