Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training

About

Self-supervised pre-training has driven rapid progress in foundation models for language, 2D images, and video, yet remains largely unexplored for learning 3D-aware representations from multi-view images. In this paper, we present E-RayZer, a self-supervised 3D vision model that learns geometrically grounded representations directly from unlabeled images. Unlike prior self-supervised methods such as RayZer, which infer 3D indirectly through latent-space view synthesis, E-RayZer operates directly in 3D space, performing self-supervised 3D reconstruction with Explicit geometry. This formulation eliminates shortcut solutions and yields representations that are 3D-aware. To ensure convergence and scalability, we introduce a fine-grained learning curriculum that organizes training from easy to hard samples and harmonizes heterogeneous data sources without any supervision. Experiments show that E-RayZer significantly outperforms RayZer on pose estimation and matches or sometimes surpasses fully supervised reconstruction models such as VGGT. Furthermore, its learned representations outperform leading visual pre-training models (e.g., DINOv3, CroCo v2, VideoMAE V2, and RayZer) on 3D downstream tasks, establishing E-RayZer as a promising paradigm for spatial visual pre-training.

Qitao Zhao, Hao Tan, Qianqian Wang, Sai Bi, Kai Zhang, Kalyan Sunkavalli, Shubham Tulsiani, Hanwen Jiang• 2025

Related benchmarks

TaskDatasetResultRank
Novel View SynthesisDL3DV
PSNR20.3
84
Novel View SynthesisScanNet++
PSNR20.7
67
Pose EstimationScanNet++--
32
Novel View SynthesisDL3DV 6view
PSNR16.85
25
Multi-view Depth EstimationBlendedMVS
AbsRel0.148
18
Multi-View Camera Pose EstimationScanNet++
RPA @ 5°2.27e+3
14
Multi-View Camera Pose EstimationBlendedMVS
RPA (5°)36.2
14
Novel View SynthesisWildRGB-D
PSNR24.9
13
6-view Novel View SynthesisMip-NeRF 360
PSNR16.56
7
Pose EstimationWildRGB-D
RPA (5°)90.8
6
Showing 10 of 12 rows

Other info

Follow for update