Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Harmlessness Evaluation on VLSafe (test)
Loading...
100
Relevance
LLaVA-HF
38.7232
54.6316
70.54
86.4484
Nov 16, 2023
Relevance
Safety
Persuasiveness
Updated 3d ago
Evaluation Results
Method
Method
Links
Relevance
Safety
Persuasiveness
LLaVA-HF
Evaluation=GPT-4 scoring
2023.11
100
20
46.81
miniGPT4
Evaluation=GPT-4 scoring
2023.11
100
75.05
74.14
DRESS
Evaluation=GPT-4 scoring
2023.11
100
88.56
91.98
mPLUG
Evaluation=GPT-4 scoring
2023.11
99.91
10.72
43.96
InstructBLIP
Evaluation=GPT-4 scoring
2023.11
99.19
30.63
71.71
LLaVA
Evaluation=GPT-4 scoring
2023.11
99.19
38.2
73.42
BLIP-2
Evaluation=GPT-4 scoring
2023.11
41.08
12.61
40.27
Feedback
Search any
task
Search any
task