Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Matrix3D: Large Photogrammetry Model All-in-One

About

We present Matrix3D, a unified model that performs several photogrammetry subtasks, including pose estimation, depth prediction, and novel view synthesis using just the same model. Matrix3D utilizes a multi-modal diffusion transformer (DiT) to integrate transformations across several modalities, such as images, camera parameters, and depth maps. The key to Matrix3D's large-scale multi-modal training lies in the incorporation of a mask learning strategy. This enables full-modality model training even with partially complete data, such as bi-modality data of image-pose and image-depth pairs, thus significantly increases the pool of available training data. Matrix3D demonstrates state-of-the-art performance in pose estimation and novel view synthesis tasks. Additionally, it offers fine-grained control through multi-round interactions, making it an innovative tool for 3D content creation. Project page: https://nju-3dv.github.io/projects/matrix3d.

Yuanxun Lu, Jingyang Zhang, Tian Fang, Jean-Daniel Nahmias, Yanghai Tsin, Long Quan, Xun Cao, Yao Yao, Shiwei Li• 2025

Related benchmarks

TaskDatasetResultRank
Depth PredictionETH3D
AbsRel19.7
37
Novel View SynthesisGoogle Scanned Objects (GSO) (test)
PSNR19.941
24
Novel View SynthesisMip-NeRF 360 out-of-domain 3
PSNR13.97
8
3D ReconstructionGSO (test)
Chamfer Distance (CD)0.058
8
Source View Depth EstimationGSO (test)
Relative Error (Rel)8.782
8
Novel View SynthesisRealEstate10K 58 (test)
PSNR14.49
8
Novel View SynthesisDL3DV 27 (test)
PSNR13.33
8
Novel View Depth EstimationGSO (test)
Relative Error8.897
5
Pose EstimationGSO (test)
RA@543.77
5
Showing 9 of 9 rows

Other info

Follow for update