Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Clinical Question Answering on NEJM-MedQA
Loading...
86.7
Accuracy
SAG
26.172
41.886
57.6
73.314
Feb 8, 2026
Accuracy
Gap
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Gap
SAG
Backbone=Llama-3B each...
2026.02
86.7
10.2
SAG
Backbone=Qwen-4B each,...
2026.02
85.6
7
SAG
Backbone=Qwen-4B each,...
2026.02
85.6
5.4
SAG
Backbone=Llama-3B each...
2026.02
84
6.3
Single giant LLM
Backbone=Qwen-72B, Opt...
2026.02
72.5
15.1
SAG
Backbone=Qwen-4B each,...
2026.02
68.1
17.9
SAG
Backbone=Llama-3B each...
2026.02
64.9
19.7
Single giant LLM
Backbone=Qwen-72B, Opt...
2026.02
57.9
19.3
Single giant LLM
Backbone=Qwen-72B, Opt...
2026.02
55.7
30.9
Single giant LLM
Backbone=Llama-70B, Op...
2026.02
54.6
18.8
Single giant LLM
Backbone=Llama-70B, Op...
2026.02
50.8
30.7
Me-LLaMA
Model Type=Clinical sp...
2026.02
44.2
26.9
Single giant LLM
Backbone=Llama-70B, Op...
2026.02
42.4
17.4
Meditron
Model Type=Clinical sp...
2026.02
28.5
28.3
Feedback
Search any
task
Search any
task