Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Multi-discipline Reasoning on EMMA core
Loading...
24.6
Accuracy
Llama 4 Scout
5.568
10.509
15.45
20.391
Feb 12, 2026
Accuracy
Updated 3d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama 4 Scout
Zero-shot=true
2026.02
24.6
Qwen2.5-VL-32B + AT-RL (Ours)
Zero-shot=true
2026.02
19.4
Claude 3.5 Sonnet
Zero-shot=true
2026.02
18.7
Qwen2.5-VL-32B + VPPO
Zero-shot=true
2026.02
17.8
Qwen2.5-VL-72B Instruct
Zero-shot=true
2026.02
17.7
Gemini 2.0 Flash
Zero-shot=true
2026.02
17.2
Qwen2.5-VL-32B Instruct
Zero-shot=true
2026.02
14.7
OpenAI GPT-4o
Zero-shot=true
2026.02
6.3
Feedback
Search any
task
Search any
task