Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

About

Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containin both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications, including 3D reconstruction and generalized robotic grasping.

Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang• 2024

Related benchmarks

Task	Dataset	Result
Depth Completion	NYU-depth-v2 official (test)	--	200
Depth Completion	KITTI (test)	--	67
Depth Estimation	ARKitScenes	L1 Error0.0132	57
Depth Completion	KITTI	RMSE3.04	53
3D Reconstruction	DTU	Average Error1.02	47
Depth Completion	iBIMS-1	MAE0.122	43
Depth Super-Resolution / Completion	ETH-3D (test)	AbsRel1.04	41
Depth Estimation	ScanNet++	AbsRel0.0175	40
Depth Completion	VOID (test)	MAE0.182	34
Depth Completion	ETH3D (test)	RMSE0.644	32

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord