Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multimodal Safety Evaluation on MM-SafeBench
Loading...
1.04
Forbidden Statements ASR
GPT-4o
-0.032
7.204
14.44
21.676
Nov 30, 2024
Forbidden Statements ASR
Query Refusal ASR
MML WR ASR
MML M ASR
MML R ASR
MML B64 ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
Forbidden Statements ASR
Query Refusal ASR
MML WR ASR
MML M ASR
MML R ASR
MML B64 ASR
GPT-4o
Evaluator=Llama-Guard-...
2024.11
1.04
13.23
95.21
95.48
95.91
96.4
Claude-3.5-Sonnet
Evaluator=Llama-Guard-...
2024.11
2.44
2.78
40.02
48.62
37.12
9.28
GPT-4o-Mini
Evaluator=Llama-Guard-...
2024.11
16.13
12.06
94.78
94.78
93.16
93.16
Qwen-VL-Max
Evaluator=Llama-Guard-...
2024.11
27.84
48.49
91.76
92.34
91.42
92.23
Feedback
Search any
task
Search any
task