Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safety Evaluation on O2M Harmful Clinical Queries
Loading...
13.43
ASR
H-R Demon
10.9712
27.5681
44.165
60.7619
Jun 8, 2025
ASR
Updated 6d ago
Evaluation Results
Method
Method
Links
ASR
H-R Demon
Model=Llava-Med-v1, De...
2025.06
13.43
H-R Demon
Model=Llava-Med-v1, De...
2025.06
17.88
H-R Demon
Model=Llava-Med-v1, De...
2025.06
21.31
H-R Demon
Model=Llava-Med-v1, De...
2025.06
32.63
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
42.61
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
44.75
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
45.1
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
47.32
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
56.82
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
56.92
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
57.53
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
57.68
B-A Demon
Model=Llava-Med-v1, De...
2025.06
60.76
Baseline (No Demon)
Model=Llava-Med-v1.5,...
2025.06
61.52
B-A Demon
Model=Llava-Med-v1, De...
2025.06
65.05
B-A Demon
Model=Llava-Med-v1, De...
2025.06
68.79
Baseline (No Demon)
Model=Llava-Med-v1, De...
2025.06
72.58
B-A Demon
Model=Llava-Med-v1, De...
2025.06
74.9
Feedback
Search any
task
Search any
task