Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Agentic Coding on Project dev (test)
Loading...
0.881
Tau^2
Claude-Sonnet-4.5
0.26116
0.42208
0.583
0.74392
Dec 4, 2025
Tau^2
Updated 4d ago
Evaluation Results
Method
Method
Links
Tau^2
Claude-Sonnet-4.5
Model Type=Proprietary
2025.12
0.881
GPT-5
Model Type=Proprietary
2025.12
0.842
DeepSeek-V3.1-Nex-N1
Model Type=Open Source...
2025.12
0.802
Minimax-M2
Model Type=Open Source...
2025.12
0.762
GLM-4.6
Model Type=Open Source...
2025.12
0.759
Kimi-K2-thinking
Model Type=Open Source...
2025.12
0.752
Qwen3-32B-Nex-N1
Model Type=Open Source...
2025.12
0.721
Gemini-2.5-pro
Model Type=Proprietary
2025.12
0.654
Qwen3-30B-A3B-Nex-N1
Model Type=Open Source...
2025.12
0.653
InternLM3-8B-Nex-N1
Model Type=Open Source...
2025.12
0.63
DeepSeek-V3.1
Model Type=Open Source...
2025.12
0.428
Qwen3-32B
Model Type=Open Source...
2025.12
0.415
Qwen3-30B-A3B
Model Type=Open Source...
2025.12
0.285
Feedback
Search any
task
Search any
task