Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool Use Reasoning on Tool use

61.31Mean Accuracy @16

GRPO

-1.589214.740431.0747.3996May 27, 2026
Updated 6d ago

Evaluation Results

MethodLinks
2026.05
61.31--
2026.05
60.85--
2026.05
60.23--
2026.05
59.93--
2026.05
59.44--
2026.05
59.38--
2026.05
59.38--
2026.05
59.19--
2026.05
59.16--
2026.05
58.98--
2026.05
58.92--
2026.05
58.85--
2026.05
58.27--
2026.05
57.17--
2026.05
57.05--
2026.05
57.05--
2026.05
56.46--
2026.05
55.15--
2026.05
54.87--
2026.05
54.01--
2026.05
39.64--
2026.05
2.27--
2026.05
0.86--
2026.05
0.83--
2026.01
-57.5-
2026.01
-64.967.7
2026.01
-60.265.7
2026.01
-6868.5
2026.01
-39.3-
2026.01
-56.465
2026.01
-56.860.6
2026.01
-60.862.1