DINO-Foresight: Looking into the Future with DINO

About

Predicting future dynamics is crucial for applications like autonomous driving and robotics, where understanding the environment is key. Existing pixel-level methods are computationally expensive and often focus on irrelevant details. To address these challenges, we introduce DINO-Foresight, a novel framework that operates in the semantic feature space of pretrained Vision Foundation Models (VFMs). Our approach trains a masked feature transformer in a self-supervised manner to predict the evolution of VFM features over time. By forecasting these features, we can apply off-the-shelf, task-specific heads for various scene understanding tasks. In this framework, VFM features are treated as a latent space, to which different heads attach to perform specific tasks for future-frame analysis. Extensive experiments show the very strong performance, robustness and scalability of our framework. Project page and code at https://dino-foresight.github.io/ .

Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris, Nikos Komodakis• 2024

Related benchmarks

Task	Dataset	Result
Depth Forecasting	Cityscapes mid-term	Delta 185.4	13
Depth Forecasting	Cityscapes short-term	Delta 1 Accuracy88.6	13
Future Semantic Segmentation	Cityscapes (test/val)	mIoU (All Classes)44.75	12
Future Depth Estimation	Kubric (test)	d1 Score69.31	12
Future Semantic Segmentation	Kubric (test)	mIoU (All Classes)57.62	12
Future Surface Normals Estimation	Cityscapes (test/val)	A389.87	12
Future Depth Estimation	Cityscapes (test/val)	d1 Score77.66	12
Future Surface Normals Estimation	Kubric (test)	a390.62	12
Depth Forecasting	KITTI 2011_09_26_drive_0002_sync	AbsRel (Short)6.3	6
Depth Forecasting	KITTI 2011_10_03_drive_0047_sync	AbsRel (Short Range)0.076	6

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord