Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent interaction on Agent
Loading...
100
Clean Success (Eager)
Llama-3.2-1B-Instruct
95
97.5
100
102.5
May 20, 2026
Clean Success (Eager)
Clean Success (Compiled)
Trigger Success (Eager)
Trigger Success (Compiled)
Updated 13d ago
Evaluation Results
Method
Method
Links
Clean Success (Eager)
Clean Success (Compiled)
Trigger Success (Eager)
Trigger Success (Compiled)
Llama-3.2-1B-Instruct
Execution Backend=CUDA...
2026.05
100
100
37.5
68.8
Llama-3.2-3B-Instruct
Execution Backend=CUDA...
2026.05
100
100
41.2
56.2
Qwen2.5-1.5B-Instruct
Execution Backend=CUDA...
2026.05
100
100
90
66.2
Qwen2.5-3B-Instruct
Execution Backend=CUDA...
2026.05
100
100
65
60
Feedback
Search any
task
Search any
task