Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Tool Use on RobustBench-TC Perturbed 1.0 (test)

52.9Accuracy (Perturbed)

Qwen3-14B

25.65232.72639.846.874May 12, 2026
Updated 21d ago

Evaluation Results

MethodLinks
2026.05
52.90.0390.1430.4640.25
2026.05
52.70.028-0.010.3970.218
2026.05
52.40.0240.1040.3660.377
2026.05
50.80.0340.120.5090.396
2026.05
50.20.0490.1570.3260.276
2026.05
47.90.0230.1560.4010.229
2026.05
47.60.0490.1730.4290.296
2026.05
470.0110.090.3340.202
2026.05
46.30.0090.1470.3310.231
2026.05
46.10.0490.1520.3640.232
2026.05
44.80.0190.1710.4090.293
2026.05
44.20.0130.160.3180.213
2026.05
43.70.040.1680.4110.417
2026.05
43.20.0330.1740.4810.379
2026.05
40.50.0240.1660.4560.333
2026.05
40.40.020.1430.450.315
2026.05
390.0250.1710.4630.296
2026.05
390.0140.1950.4460.237
2026.05
36.70.040.20.4050.205
2026.05
34.10.0090.1770.3720.174
2026.05
26.70.0190.1840.4430.577