Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

About

This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category, and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard's performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework, including the dataset, model weights, and training code.

Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski• 2024

Related benchmarks

TaskDatasetResultRank
Safety EvaluationUnsafeBench
F1 Score63.4
24
Unsafe content detectionLlavaGuard
Accuracy82
14
Safety EvaluationSMID (test)
F1 Score66.6
11
Safety EvaluationUnsafeDiff (test)
F1 Score53
11
Safety EvaluationUnsafeBench (test)
F1 Score53.7
11
Severity-wise Harmfulness ClassificationBLM-Guard
Accuracy (High)73.3
9
Binary Harmfulness DetectionBLM-Guard
NR (B)82.3
9
Jailbreak DetectionFigStep
AUROC0.836
9
Unsafe content detectionVLGuard
F1 Score69.8
9
Jailbreak DetectionJailBreakV
AUROC84.26
9
Showing 10 of 15 rows

Other info

Follow for update