Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Visual Tool Reasoning on VisualToolBench (test)

12.85Average@4

Agent-KB

7.74369.069310.39511.7207Mar 12, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
12.8520.093.88
2026.03
11.819.623.82
2026.03
11.5220.565.12
2026.03
11.0918.692.65
2026.03
10.8619.164.09
2026.03
10.5117.76-
2026.03
10.2816.822.88
2026.03
10.2819.434.19
2026.03
10.0517.762.73
2026.03
10.0418.223.71
2026.03
9.1115.892.54
2026.03
7.9412.62-