Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Vision-Language Feature Alignment for Road Anomaly Segmentation

About

Safe autonomous systems in complex environments require robust road anomaly segmentation to identify unknown obstacles. However, existing approaches often rely on pixel-level statistics to determine whether a region appears anomalous. This reliance leads to high false-positive rates on semantically normal background regions such as sky or vegetation, and poor recall of true Out-of-distribution (OOD) instances, thereby posing safety risks for robotic perception and decision-making. To address these challenges, we propose VL-Anomaly, a vision-language anomaly segmentation framework that incorporates semantic priors from pre-trained Vision-Language Models (VLMs). Specifically, we design a prompt learning-driven alignment module that adapts Mask2Forme's visual features to CLIP text embeddings of known categories, effectively suppressing spurious anomaly responses in background regions. At inference time, we further introduce a multi-source inference strategy that integrates text-guided similarity, CLIP-based image-text similarity and detector confidence, enabling more reliable anomaly prediction by leveraging complementary information sources. Extensive experiments demonstrate that VL-Anomaly achieves state-of-the-art performance on benchmark datasets including RoadAnomaly, SMIYC and Fishyscapes.Code is released on https://github.com/NickHezhuolin/VL-aligner-Road-anomaly-segment.

Zhuolin He, Jiacheng Tang, Jian Pu, Xiangyang Xue• 2026

Related benchmarks

TaskDatasetResultRank
Anomaly SegmentationFishyscapes Lost & Found (val)
FPR958.4
74
Anomaly SegmentationFishyscapes Static (val)
FPR950.023
53
Anomaly SegmentationSMIYC-RA21 (val)
AuPRC (%)95.1
22
Anomaly SegmentationSMIYC-RO21 (val)
AuPRC91
22
Anomaly SegmentationSMIYC RA 21 (val)
sIoU59.6
13
Showing 5 of 5 rows

Other info

Follow for update