Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Agent Planning and API Calling on ToolBench

82.5Plan ACC

GPT-4o

40.58851.46962.3573.231Jun 24, 2025
Updated 17d ago

Evaluation Results

MethodLinks
2025.06
82.55731.122.237.399.955
2025.06
73.548.517.631.53996.751.2
2025.06
73.34716.329.636.894.549.6
2025.06
72.743.715.12128.899.346.8
72.345.218.718.228.510047.1
68.43814.914.323.699.443.1
2025.06
55.73310.120.526.595.740.3
2025.06
42.2153.77.110.110029.7