Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool-use Agent Tasks on TinyAgent (500 samples, evaluation)

66.8Accuracy

Baseline (normal SFT)

-1.11216.51934.1551.781May 13, 2026
Updated 20d ago

Evaluation Results

MethodLinks
2026.05
66.85-
2026.05
65.64.1-
2026.05
65.22.5-
2026.05
62.12.5-
2026.05
54.97.6-
2026.05
53.24.41.7
2026.05
14.3--
2026.05
10.8--
2026.05
3.2--
2026.05
2.3--
2026.05
2.1--
2026.05
1.5--