Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Functional correctness for backend applications on Baxbench
Loading...
69.8
Functional Correctness
Claude-Sonnet-4.5
-2.48
16.285
35.05
53.815
Dec 4, 2025
Functional Correctness
Updated 4d ago
Evaluation Results
Method
Method
Links
Functional Correctness
Claude-Sonnet-4.5
Model Type=Proprietary
2025.12
69.8
GPT-5
Model Type=Proprietary
2025.12
67.1
DeepSeek-V3.1-Nex-N1
Model Type=Open Source...
2025.12
59.7
Kimi-K2-thinking
Model Type=Open Source...
2025.12
57.4
DeepSeek-V3.1
Model Type=Open Source...
2025.12
50.1
Gemini-2.5-pro
Model Type=Proprietary
2025.12
49.7
Qwen3-32B
Model Type=Open Source...
2025.12
35.6
Qwen3-32B-Nex-N1
Model Type=Open Source...
2025.12
34.8
GLM-4.6
Model Type=Open Source...
2025.12
32.1
Qwen3-30B-A3B
Model Type=Open Source...
2025.12
27.2
Minimax-M2
Model Type=Open Source...
2025.12
23.4
Qwen3-30B-A3B-Nex-N1
Model Type=Open Source...
2025.12
13.6
InternLM3-8B
Model Type=Open Source...
2025.12
1.6
InternLM3-8B-Nex-N1
Model Type=Open Source...
2025.12
0.3
Feedback
Search any
task
Search any
task