Vision-Language Feature Alignment for Road Anomaly Segmentation

About

Safe autonomous systems in complex environments require robust road anomaly segmentation to identify unknown obstacles. However, existing approaches often rely on pixel-level statistics to determine whether a region appears anomalous. This reliance leads to high false-positive rates on semantically normal background regions such as sky or vegetation, and poor recall of true Out-of-distribution (OOD) instances, thereby posing safety risks for robotic perception and decision-making. To address these challenges, we propose VL-Anomaly, a vision-language anomaly segmentation framework that incorporates semantic priors from pre-trained Vision-Language Models (VLMs). Specifically, we design a prompt learning-driven alignment module that adapts Mask2Forme's visual features to CLIP text embeddings of known categories, effectively suppressing spurious anomaly responses in background regions. At inference time, we further introduce a multi-source inference strategy that integrates text-guided similarity, CLIP-based image-text similarity and detector confidence, enabling more reliable anomaly prediction by leveraging complementary information sources. Extensive experiments demonstrate that VL-Anomaly achieves state-of-the-art performance on benchmark datasets including RoadAnomaly, SMIYC and Fishyscapes.Code is released on https://github.com/NickHezhuolin/VL-aligner-Road-anomaly-segment.

Zhuolin He, Jiacheng Tang, Jian Pu, Xiangyang Xue• 2026

Related benchmarks

Task	Dataset	Result
Anomaly Segmentation	Fishyscapes Lost & Found (val)	FPR958.4	74
Anomaly Segmentation	Fishyscapes Static (val)	FPR950.023	53
Anomaly Segmentation	SMIYC-RA21 (val)	AuPRC (%)95.1	22
Anomaly Segmentation	SMIYC-RO21 (val)	AuPRC91	22
Anomaly Segmentation	SMIYC RA 21 (val)	sIoU59.6	13

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord