Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Codebase generation on URL Shortener App (test)
Loading...
91.6
Feature Completeness (%)
Code-L2MAC
22.648
40.549
58.45
76.351
Oct 2, 2023
Feature Completeness (%)
Error Count
Lines of Code (LOC)
Tests Passed Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Feature Completeness (%)
Error Count
Lines of Code (LOC)
Tests Passed Score
Code-L2MAC
2023.10
91.6
0
330
14
GPT4
2023.10
53.6
0
119
2.56
CodeT
2023.10
52.9
0.05
110
3.6
Self-Refine
2023.10
47.9
0.05
124
3.65
Reflexion
2023.10
38.8
0.1
96.2
2.35
AutoGPT
Backbone=GPT-4
2023.10
25.3
0
136
3.3
Feedback
Search any
task
Search any
task