Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool Use on Evaluation dataset

51.98Accuracy

PORTool

10.463221.241632.0242.7984Oct 29, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.10
51.983.077.10.851
2025.10
48.233.311.210.816
2025.10
48.183.1912.340.819
2025.10
47.623.1210.470.832
2025.10
47.583.1811.520.826
2025.10
46.62.647.290.877
2025.10
46.093.3113.270.808
2025.10
45.523.212.330.814
2025.10
44.973.1611.560.827
2025.10
43.513.3714.580.79
2025.10
42.762.799.910.857
2025.10
41.762.8211.030.884
2025.10
41.732.9411.40.81
2025.10
39.473.0811.960.793
2025.10
39.273.0711.960.784
2025.10
39.082.9610.650.834
2025.10
37.23.2113.270.767
2025.10
34.563.2111.030.808
2025.10
24.364.7658.50.466
2025.10
12.065.583.920.322