Improved monocular depth prediction using distance transform over pre-semantic contours with self-supervised neural networks

About

Monocular depth estimation (MDE) with self-supervised training approaches struggles in low-texture areas, where photometric losses may lead to ambiguous depth predictions. To address this, we propose a novel technique that enhances spatial information by applying a distance transform over pre-semantic contours, augmenting discriminative power in low texture regions. Our approach jointly estimates pre-semantic contours, depth and ego-motion. The pre-semantic contours are leveraged to produce new input images, with variance augmented by the distance transform in uniform areas. This approach results in more effective loss functions, enhancing the training process for depth and ego-motion. We demonstrate theoretically that the distance transform is the optimal variance-augmenting technique in this context. Through extensive experiments on KITTI, Cityscapes, Waymo, NYUv2 and ScanNet our model demonstrates robust performance, surpassing competing self-supervised methods in MDE.

Marwane Hariat, Antoine Manzanera, David Filliat• 2026

Related benchmarks

Task	Dataset	Result
Depth Estimation	NYU v2 (test)	Threshold Accuracy (delta < 1.25)85.9	438
Monocular Depth Estimation	NYU v2 (test)	Abs Rel0.115	327
Optical Flow	KITTI 2015 (test)	--	122
Monocular Depth Estimation	Cityscapes	Accuracy (delta < 1.25)85	74
Monocular Depth Estimation	KITTI	AbsRel8.2	33
Monocular Depth Estimation	KITTI 2015 (test)	Abs Rel0.116	22
Monocular Depth Estimation	KITTI 2015	Abs Rel0.082	14
Depth Estimation	ScanNet v1 (test)	AbsRel0.127	14
Visual Odometry	KITTI Odometry Seq. 09	t_err8.39	12
Odometry	KITTI Odometry Sequence 10	Translational Error (%)7.17	9

Showing 10 of 12 rows

Other info

Follow for update

@wizwand_team Discord