Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Data Analysis on DABStep hard
Loading...
37.04
Accuracy
Claude Sonnet 4.5
0.9936
10.3518
19.71
29.0682
Jan 22, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Claude Sonnet 4.5
Model Type=Proprietary
2026.01
37.04
Claude Sonnet 4
Model Type=Proprietary
2026.01
31.75
Kimi K2 Instruct
Model Type=Open-sourced
2026.01
28.84
GPT-5
Reasoning effort=mediu...
2026.01
28.31
Deepseek-v3.1
Model Type=Open-sourced
2026.01
21.96
Qwen3 235B Instruct
Model Type=Open-sourced
2026.01
17.46
Qwen3-Coder 480B
Model Type=Open-sourced
2026.01
14.29
GPT-5.1
Reasoning effort=high,...
2026.01
13.23
GPT-5.1
Reasoning effort=none,...
2026.01
11.9
GPT-OSS-120B
Model Type=Open-sourced
2026.01
7.94
GPT-4o
Model Type=Proprietary
2026.01
7.41
Qwen3-4B-Instruct
Model Type=Open-sourced
2026.01
2.9
Qwen2.5-7B-Instruct
Model Type=Open-sourced
2026.01
2.38
Feedback
Search any
task
Search any
task