The Constant Eye: Benchmarking and Bridging Appearance Robustness in Autonomous Driving

About

Despite rapid progress, autonomous driving algorithms remain notoriously fragile under Out-of-Distribution (OOD) conditions. We identify a critical decoupling failure in current research: the lack of distinction between appearance-based shifts, such as weather and lighting, and structural scene changes. This leaves a fundamental question unanswered: Is the planner failing because of complex road geometry, or simply because it is raining? To resolve this, we establish navdream, a high-fidelity robustness benchmark leveraging generative pixel-aligned style transfer. By creating a visual stress test with negligible geometric deviation, we isolate the impact of appearance on driving performance. Our evaluation reveals that existing planning algorithms often show significant degradation under OOD appearance conditions, even when the underlying scene structure remains consistent. To bridge this gap, we propose a universal perception interface leveraging a frozen visual foundation model (DINOv3). By extracting appearance-invariant features as a stable interface for the planner, we achieve exceptional zero-shot generalization across diverse planning paradigms, including regression-based, diffusion-based, and scoring-based models. Our plug-and-play solution maintains consistent performance across extreme appearance shifts without requiring further fine-tuning. The benchmark and code will be made available.

Jiabao Wang, Hongyu Zhou, Yuanbo Yang, Jiahao Shao, Yiyi Liao• 2026

Related benchmarks

Task	Dataset	Result
End-to-end Planning	navdream 1.0 (OOD)	NC0.983	9
End-to-end Planning	NAVSIM 1.0 (test)	EPDMS87.6	9
End-to-end Planning	navdream 1.0 (Origin)	NC98.5	9
End-to-end Planning	NAVSIM hard 1.0	EPDMS30.4	9

Showing 4 of 4 rows

Other info

Follow for update

@wizwand_team Discord