Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Task on Agent
Loading...
100
Accuracy
Llama-3.2-1B-Instruct
68.8
76.9
85
93.1
May 20, 2026
Accuracy
Updated 13d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama-3.2-1B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-1B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-1B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-1.5B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Clean, Exec...
2026.05
100
Qwen2.5-3B-Instruct
Input Type=Trigger, Ex...
2026.05
100
Llama-3.2-3B-Instruct
Input Type=Trigger, Ex...
2026.05
97.5
Qwen2.5-1.5B-Instruct
Input Type=Trigger, Ex...
2026.05
93.8
Llama-3.2-1B-Instruct
Input Type=Trigger, Ex...
2026.05
83.8
Qwen2.5-3B-Instruct
Input Type=Trigger, Ex...
2026.05
75
Qwen2.5-1.5B-Instruct
Input Type=Trigger, Ex...
2026.05
70
Feedback
Search any
task
Search any
task