Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Diagnosis on VeriSim Noisy Levels 1-3
Loading...
69.2
Top-1 Accuracy
Qwen-2.5-72B
32.072
41.711
51.35
60.989
Apr 12, 2026
Top-1 Accuracy
Accuracy Delta
Turns Delta
Updated 4d ago
Evaluation Results
Method
Method
Links
Top-1 Accuracy
Accuracy Delta
Turns Delta
Qwen-2.5-72B
Size=72B, Type=General
2026.04
69.2
-15.3
34.5
Llama-3.1-70B
Size=70B, Type=General
2026.04
65.5
-16.6
35.3
OpenBioLLM-70B
Size=70B, Type=Medical
2026.04
63.1
-16.1
36
Meditron-70B
Size=70B, Type=Medical
2026.04
62.8
-15.6
35.2
BioMistral-7B
Size=7B, Type=Medical
2026.04
41.8
-22.4
50
Llama-3.1-8B
Size=8B, Type=General
2026.04
40.2
-21.6
48.8
Mistral-7B
Size=7B, Type=General
2026.04
33.5
-24.5
55.1
Feedback
Search any
task
Search any
task