Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Question Answering Robustness on medical testbed 800-question
Loading...
76.88
IDC
R-FT
67.9048
70.2349
72.565
74.8951
Apr 23, 2026
IDC
BSP
Updated 8d ago
Evaluation Results
Method
Method
Links
IDC
BSP
R-FT
Model Backbone=Llama-3...
2026.04
76.88
99.84
RBED+R-FT
Model Backbone=Llama-3...
2026.04
76.88
99.87
PBT
Model Backbone=Llama-3...
2026.04
74.5
61.4
DuET-PD
Model Backbone=Llama-3...
2026.04
74.38
11.23
Warning Prompt
Model Backbone=Llama-3...
2026.04
68.25
1.38
Vanilla
Model Backbone=Llama-3...
2026.04
68.25
1.55
RBED
Model Backbone=Llama-3...
2026.04
68.25
8
Feedback
Search any
task
Search any
task