Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Clinical Question Answering on GPQA Bio
Loading...
92.6
Accuracy
SAG
17.512
37.006
56.5
75.994
Feb 8, 2026
Accuracy
Gap
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Gap
SAG
Backbone=Qwen-4B each,...
2026.02
92.6
7
SAG
Backbone=Llama-3B each...
2026.02
88.6
6.3
SAG
Backbone=Qwen-4B each,...
2026.02
87.3
5.4
SAG
Backbone=Llama-3B each...
2026.02
79.4
10.2
SAG
Backbone=Qwen-4B each,...
2026.02
73.3
17.9
Single giant LLM
Backbone=Qwen-72B, Opt...
2026.02
72
19.3
SAG
Backbone=Llama-3B each...
2026.02
70.1
19.7
Single giant LLM
Backbone=Qwen-72B, Opt...
2026.02
67.3
15.1
Single giant LLM
Backbone=Llama-70B, Op...
2026.02
62
30.7
Single giant LLM
Backbone=Llama-70B, Op...
2026.02
61.3
18.8
Me-LLaMA
Model Type=Clinical sp...
2026.02
51.2
26.9
Single giant LLM
Backbone=Llama-70B, Op...
2026.02
43.6
17.4
Single giant LLM
Backbone=Qwen-72B, Opt...
2026.02
41.1
30.9
Meditron
Model Type=Clinical sp...
2026.02
20.4
28.3
Feedback
Search any
task
Search any
task