Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multimodal Code Generation on V-Code
Loading...
56.1
Accuracy
AutoTool (Qwen3-8B)
11.588
23.144
34.7
46.256
Dec 15, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
AutoTool (Qwen3-8B)
Size=8B, Backbone=Qwen3
2025.12
56.1
AutoTool (Qwen2.5-VL-7B)
Size=7B, Backbone=Qwen...
2025.12
52.5
GPT4o
Size=-
2025.12
51.2
Qwen2.5-VL-72B-Instruct
Size=72B
2025.12
18.2
v-ToolRL
Size=2B
2025.12
13.3
Feedback
Search any
task
Search any
task