Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Medical Task on Medical
Loading...
100
Accuracy
Llama-3.2-1B-Instruct
75.248
81.674
88.1
94.526
May 20, 2026
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama-3.2-1B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-1B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-1B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Trigger, Ex...
2026.05
90
Qwen2.5-1.5B-Instruct
Input Type=Trigger, Ex...
2026.05
86.3
Llama-3.2-1B-Instruct
Input Type=Trigger, Ex...
2026.05
83.8
Llama-3.2-3B-Instruct
Input Type=Trigger, Ex...
2026.05
76.2
Feedback
Search any
task
Search any
task