Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Generation on LIVECODE (held-out)
Loading...
25.7
DBS
EXPERT
5.524
10.762
16
21.238
Feb 11, 2026
DBS
DVSavg@32
Updated 4d ago
Evaluation Results
Method
Method
Links
DBS
DVSavg@32
EXPERT
Model=Qwen3-1.7B optim...
2026.02
25.7
-
SOURCEbest
2026.02
10.3
-
DataChef-32B
Oracle Upper Bound=tru...
2026.02
10.3
-
Kimi-K2
2026.02
9.7
19.3
Gemini-3-Pro
2026.02
9.1
53.6
DataChef-32B
2026.02
9.1
45.8
Qwen3-32B
2026.02
8
24.3
Qwen3-Next ⊕ Kimi-K2
Reasoning backbone=Qwe...
2026.02
7.4
39.2
SOURCEavg
2026.02
6.3
-
Feedback
Search any
task
Search any
task