Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

VisualToolBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multimodal Agent TaskVisualToolBench
Average@446.5
24
Visual Tool ReasoningVisualToolBench (test)
Average@412.85
12
Showing 2 of 2 rows