Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DeFM: Learning Foundation Representations from Depth for Robotics

About

Depth sensors are widely deployed across robotic platforms, and advances in fast, high-fidelity depth simulation have enabled robotic policies trained on depth observations to achieve robust sim-to-real transfer for a wide range of tasks. Despite this, representation learning for depth modality remains underexplored compared to RGB, where large-scale foundation models now define the state of the art. To address this gap, we present DeFM, a self-supervised foundation model trained entirely on depth images for robotic applications. Using a DINO-style self-distillation objective on a curated dataset of 60M depth images, DeFM learns geometric and semantic representations that generalize to diverse environments, tasks, and sensors. To retain metric awareness across multiple scales, we introduce a novel input normalization strategy. We further distill DeFM into compact models suitable for resource-constrained robotic systems. When evaluated on depth-based classification, segmentation, navigation, locomotion, and manipulation benchmarks, DeFM achieves state-of-the-art performance and demonstrates strong generalization from simulation to real-world environments. We release all our pretrained models, which can be adopted off-the-shelf for depth-based robotic learning without task-specific fine-tuning. Webpage: https://de-fm.github.io/

Manthan Patel, Jonas Frey, Mayank Mittal, Fan Yang, Alexander Hansson, Amir Bar, Cesar Cadena, Marco Hutter• 2026

Related benchmarks

TaskDatasetResultRank
Semantic segmentationScanNet
mIoU31.34
59
Semantic segmentationSUN-RGBD
IoU31.26
19
Image ClassificationImageNet 1k Depth
Top-1 Accuracy (KNN)63.46
16
Semantic segmentationOFFSED
mIoU57.62
11
Semantic segmentationTartanGround
mIoU67.69
11
Semantic segmentationGraspNet-1B
mIoU27.85
11
Point-Goal navigationGibson (val)
SPL88.76
6
Point-Goal navigationMP3D (val)
SPL0.7585
6
Showing 8 of 8 rows

Other info

Follow for update