Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multimodal Safety Evaluation on GOAT (test)
Loading...
56.9
Misogyny Accuracy
OSGA
49.1
51.125
53.15
55.175
Jan 30, 2026
Misogyny Accuracy
Misogyny F1 Score
Offensiveness Accuracy
Offensiveness F1 Score
Sarcasm Accuracy
Sarcasm F1 Score
Harmfulness Accuracy
Harmfulness F1 Score
Average Accuracy
Average F1 Score
Updated 3d ago
Evaluation Results
Method
Method
Links
Misogyny Accuracy
Misogyny F1 Score
Offensiveness Accuracy
Offensiveness F1 Score
Sarcasm Accuracy
Sarcasm F1 Score
Harmfulness Accuracy
Harmfulness F1 Score
Average Accuracy
Average F1 Score
OSGA
Base Model=LLaVA-v1.5
2026.01
56.9
36.71
47.51
57.61
50.38
38.3
68.02
47.12
55.7
44.94
Baseline
Base Model=LLaVA-v1.5
2026.01
49.4
32.76
41.59
30.48
50.11
33.58
44
34.52
46.28
32.84
Feedback
Search any
task
Search any
task