Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Safety Evaluation on HADES-Dataset
Loading...
43.67
HADES Score
Qwen-VL-Max
-1.7468
10.0441
21.835
33.6259
Nov 30, 2024
HADES Score
MML-WR
MML-M
MML-R
MML-B64
Updated 4d ago
Evaluation Results
Method
Method
Links
HADES Score
MML-WR
MML-M
MML-R
MML-B64
Qwen-VL-Max
Evaluator=Llama-Guard-...
2024.11
43.67
96.22
96.51
97.23
95.49
GPT-4o
Evaluator=Llama-Guard-...
2024.11
2.62
97.96
97.82
97.09
97.53
GPT-4o-Mini
Evaluator=Llama-Guard-...
2024.11
2.62
97.82
97.23
96.65
94.76
Claude-3.5-Sonnet
Evaluator=Llama-Guard-...
2024.11
0
35.66
42.07
29.84
9.46
Feedback
Search any
task
Search any
task