Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Safety Evaluation on multimodal safety dataset
Loading...
0.13
ASR
Claude 3.5 Sonnet
0.1098
0.24615
0.3825
0.51885
Apr 13, 2025
ASR
FPR
Accuracy
Precision
F1 Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
ASR
FPR
Accuracy
Precision
F1 Score
Claude 3.5 Sonnet
2025.04
0.13
0.6944
0.59
0.56
0.68
GPT-4o Mini
2025.04
0.25
0.6444
0.55
0.54
0.63
Gemini 1.5 Pro
2025.04
0.255
0.2333
0.76
0.76
0.75
Claude 3 Haiku
2025.04
0.265
0.4722
0.63
0.61
0.67
GPT-4o
2025.04
0.44
0.1278
0.72
0.81
0.66
Gemini 1.5 Flash
2025.04
0.635
0.2389
0.56
0.6
0.46
Feedback
Search any
task
Search any
task