Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BIRD-Python

Benchmarks

Task NameDataset NameSOTA ResultTrend
Code GenerationBIRD-Python Original (dev)
Execution Accuracy (Simple)0.6584
14
Code GenerationBIRD-Python Verified
Execution Accuracy (Simple)0.6995
14
Showing 2 of 2 rows