Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Falcon: A Cross-Modal Evaluation Dataset for Comprehensive Safety Perception

About

Existing methods for evaluating the harmfulness of content generated by large language models (LLMs) have been well studied. However, approaches tailored to multimodal large language models (MLLMs) remain underdeveloped and lack depth. This work highlights the crucial role of visual information in moderating content in visual question answering (VQA), a dimension often overlooked in current research. To bridge this gap, we introduce Falcon, a large-scale vision-language safety dataset containing 57,515 VQA pairs across 13 harm categories. The dataset provides explicit annotations for harmful attributes across images, instructions, and responses, thereby facilitating a comprehensive evaluation of the content generated by MLLMs. In addition, it includes the relevant harm categories along with explanations supporting the corresponding judgments. We further propose FalconEye, a specialized evaluator fine-tuned from Qwen2.5-VL-7B using the Falcon dataset. Experimental results demonstrate that FalconEye reliably identifies harmful content in complex and safety-critical multimodal dialogue scenarios. It outperforms all other baselines in overall accuracy across our proposed Falcon-test dataset and two widely-used benchmarks-VLGuard and Beavertail-V, underscoring its potential as a practical safety auditing tool for MLLMs.

Qi Xue, Minrui Jiang, Runjia Zhang, Xiurui Xie, Pei Ke, Guisong Liu• 2025

Related benchmarks

TaskDatasetResultRank
NSFW image classificationPixArt ID
Accuracy50.1
4
NSFW image classificationT2I Flux2 ID
Accuracy55.01
4
NSFW image classificationSD T2I ID v1.5
Accuracy49.5
4
NSFW image classificationT2I Qwen-Image OOD
Accuracy50.05
4
NSFW image classificationT2I SD OOD 3.5
Accuracy57.6
4
NSFW image classificationFlux1 T2I ID
Accuracy50
4
NSFW image classificationT2I SD3 ID
Accuracy50.13
4
NSFW image classificationSDXL T2I OOD
Accuracy45.16
4
NSFW image classificationT2I Zimage OOD
Accuracy55.74
4
Showing 9 of 9 rows

Other info

Follow for update