Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Performance Evaluation on O2M Benign Clinical Queries
Loading...
30.56
RR
H-R Demon
0.036
7.9605
15.885
23.8095
Jun 8, 2025
RR
Updated 6d ago
Evaluation Results
Method
Method
Links
RR
H-R Demon
Model=Llava-Med-v1, De...
2025.06
30.56
H-R Demon
Model=Llava-Med-v1, De...
2025.06
22.78
H-R Demon
Model=Llava-Med-v1, De...
2025.06
12.78
H-R Demon
Model=Llava-Med-v1, De...
2025.06
6.31
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
4.7
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
4.7
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
4.6
H-R Demon
Model=Llava-Med-v1.5,...
2025.06
4.44
Baseline (No Demon)
Model=Llava-Med-v1.5,...
2025.06
3.03
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
2.83
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
2.78
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
2.53
B-A Demon
Model=Llava-Med-v1.5,...
2025.06
2.47
Baseline (No Demon)
Model=Llava-Med-v1, De...
2025.06
2.12
B-A Demon
Model=Llava-Med-v1, De...
2025.06
1.67
B-A Demon
Model=Llava-Med-v1, De...
2025.06
1.62
B-A Demon
Model=Llava-Med-v1, De...
2025.06
1.46
B-A Demon
Model=Llava-Med-v1, De...
2025.06
1.21
Feedback
Search any
task
Search any
task