Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Tool Misuse on Single-Agent Evaluation Set

1R@5

Query+

0.97920.98460.990.9954Jan 11, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.01
10.78--
2026.01
10.83--
2026.01
10.78--
2026.01
10.83--
2026.01
10.87--
2026.01
0.980.87--
2026.01
--100100