Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal tool-use on GTA
Loading...
60.9
Answer Accuracy
GPT-5
12.228
24.864
37.5
50.136
Mar 17, 2026
Answer Accuracy
Tool Accuracy
Code Execution Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Answer Accuracy
Tool Accuracy
Code Execution Success Rate
GPT-5
Params=-
2026.03
60.9
68.3
98.7
GPT-4.1
Params=-
2026.03
58.4
65.1
94.3
GPT-4o
Params=-
2026.03
57.1
63.4
95.1
TraceR1
Params=8B
2026.03
56.7
65.7
87.4
T3-Agent
Params=7B
2026.03
53.8
64.6
84.3
Qwen3-VL-8B
Params=8B
2026.03
49.2
56.8
74.2
Qwen2.5-VL-14B
Params=14B
2026.03
46.8
55.4
69.8
Qwen2.5-VL-7B
Params=7B
2026.03
44.2
50.6
69.1
DeepSeek-VL2
Params=72B
2026.03
23.2
49.4
57.2
LLAVA-NeXT-8B
Params=8B
2026.03
14.1
14.9
25.1
Feedback
Search any
task
Search any
task