Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Generation on LiveCodeBench v6, HumanEval+, MBPP+, and SciCode
Loading...
0.992
Pass@1
DeepSeek-V3.1
0.29728
0.47764
0.658
0.83836
Jan 14, 2026
Pass@1
Updated 4d ago
Evaluation Results
Method
Method
Links
Pass@1
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
0.992
GLM-4.6
Thinking Mode=true, Pa...
2026.01
0.989
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
0.939
A.X K1
Thinking Mode=true, Pa...
2026.01
0.93
A.X K1
Thinking Mode=true, Pa...
2026.01
0.902
A.X K1
Thinking Mode=true, Pa...
2026.01
0.872
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
0.86
GLM-4.6
Thinking Mode=true, Pa...
2026.01
0.86
GLM-4.6
Thinking Mode=true, Pa...
2026.01
0.835
GLM-4.6
Thinking Mode=true, Pa...
2026.01
0.76
A.X K1
Thinking Mode=true, Pa...
2026.01
0.758
A.X K1
Thinking Mode=true, Pa...
2026.01
0.731
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
0.695
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
0.662
GLM-4.6
Thinking Mode=true, Pa...
2026.01
0.559
DeepSeek-V3.1
Thinking Mode=true, Pa...
2026.01
0.391
GLM-4.6
Thinking Mode=true, Pa...
2026.01
0.384
A.X K1
Thinking Mode=true, Pa...
2026.01
0.324
Feedback
Search any
task
Search any
task