Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DevEval

Benchmarks

Task NameDataset NameSOTA ResultTrend
Docstring EvaluationDevEval 183 human-written docstrings
Score4.938
5
Repository-level code generationDevEval
Inference Time442
4
Terminal-related CLI agent taskDevEval
Accuracy39.74
2
Showing 3 of 3 rows