Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Code Generation on BIRD-Python Verified

0.6995Execution Accuracy (Simple)

DeepSeek-R1

-0.00770.17590.35950.5431Jan 22, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2026.01
0.69950.53020.45520.6252
2026.01
0.67890.59910.46210.6343
2026.01
0.65950.55820.5310.616
2026.01
0.65730.53660.52410.6082
2026.01
0.65330.51940.54480.601
2026.01
0.62160.49780.42760.5658
2026.01
0.60760.45690.40690.543
2026.01
0.59460.45040.39310.5319
2026.01
0.5470.37930.37240.4798
2026.01
0.49510.41590.35860.4583
2026.01
0.41910.26510.28280.3595
2026.01
0.25460.17240.1310.1982
2026.01
0.08650.04310.02070.0545
2026.01
0.01950.008600.0112