Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multimodal Safety Evaluation on SafeBench
Loading...
3.26
FS ASR
Claude-3.5-Sonnet
-0.1928
23.1136
46.42
69.7264
Nov 30, 2024
FS ASR
QR ASR
MML-WR ASR
MML-M ASR
MML-R ASR
MML-B64 ASR
Updated 4d ago
Evaluation Results
Method
Method
Links
FS ASR
QR ASR
MML-WR ASR
MML-M ASR
MML-R ASR
MML-B64 ASR
Claude-3.5-Sonnet
Evaluator=Llama-Guard-...
2024.11
3.26
1.3
41.69
53.42
38.11
8.14
GPT-4o
Evaluator=Llama-Guard-...
2024.11
6.19
3.91
96.42
97.39
96.42
96.74
GPT-4o-Mini
Evaluator=Llama-Guard-...
2024.11
13.68
7.82
96.42
95.77
96.42
94.14
Qwen-VL-Max
Evaluator=Llama-Guard-...
2024.11
89.58
53.75
92.51
96.74
92.18
91.86
Feedback
Search any
task
Search any
task