Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution

About

A versatile video depth estimation model should (1) be accurate and consistent across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS. We show that, with careful modifications to pretrained single-image depth models, these capabilities are enabled with relatively little data and training. We evaluate our approach across multiple unseen datasets against state-of-the-art depth models, and find that ours outperforms them in terms of boundary sharpness and speed by a significant margin, while maintaining competitive accuracy. We hope our model will enable various applications that require high-resolution depth, such as video editing, and online decision-making, such as robotics. We release all code and model weights at https://github.com/Eyeline-Research/FlashDepth

Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec• 2025

Related benchmarks

TaskDatasetResultRank
Depth EstimationKITTI
AbsRel0.16
92
Depth EstimationTUM-RGBD
Abs Rel Error0.08
16
Metric Depth EstimationC3VD (first split)
Delta1 Acc73
13
Depth EstimationSintel
AbsRel0.36
12
Depth EstimationInfinigen
AbsRel0.318
9
Depth Estimation192-frame sequence 518x924 resolution (inference)
Inference Time (s)10.8
5
Depth EstimationAverage (Infinigen, Sintel, KITTI, TUM RGB-D)
AbsRel0.23
5
Showing 7 of 7 rows

Other info

Follow for update