Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent tool-use planning on EarthBench 104 Tools (test)

69.8Tool Selection Accuracy (TSA)

Qwen3-235B

38.70446.77754.8562.923May 6, 2026
Updated 27d ago

Evaluation Results

MethodLinks
2026.05
69.865.560.432.673.661.242.9
2026.05
64.362.152.335.571.555.330.8
2026.05
62.959.25317.453.14426.1
2026.05
54.150.54156.57764.952.6
2026.05
53.747.942.126.161.845.818.3
2026.05
51.148.843.260.978.466.350
2026.05
48.944.734.73766.851.86.6
2026.05
39.936.927.96.544.835.83.4