Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool-use Agent Tasks on TinyAgent (500 samples, evaluation)
Loading...
66.8
Accuracy
Baseline (normal SFT)
-1.112
16.519
34.15
51.781
May 13, 2026
Accuracy
Latency (s)
Speedup (×)
Updated 20d ago
Evaluation Results
Method
Method
Links
Accuracy
Latency (s)
Speedup (×)
Baseline (normal SFT)
Model=Llama-3.2-3B
2026.05
66.8
5
-
Baseline (normal SFT)
Model=Qwen2.5-3B
2026.05
65.6
4.1
-
AsyncIO (Async-SFT)
Model=Llama-3.2-3B
2026.05
65.2
2.5
-
AsyncIO (Async-SFT)
Model=Qwen2.5-3B
2026.05
62.1
2.5
-
Baseline
Model=openai-realtime-1.5
2026.05
54.9
7.6
-
AsyncIO
Model=openai-realtime-1.5
2026.05
53.2
4.4
1.7
AsyncIO (normal SFT)
Model=Qwen2.5-3B
2026.05
14.3
-
-
AsyncIO (normal SFT)
Model=Llama-3.2-3B
2026.05
10.8
-
-
Baseline (no SFT)
Model=Qwen2.5-3B
2026.05
3.2
-
-
AsyncIO (no SFT)
Model=Qwen2.5-3B
2026.05
2.3
-
-
AsyncIO (no SFT)
Model=Llama-3.2-3B
2026.05
2.1
-
-
Baseline (no SFT)
Model=Llama-3.2-3B
2026.05
1.5
-
-
Feedback
Search any
task
Search any
task