Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Codebase generation on Online Chat App (test)
Loading...
88.4
Feature Completeness
Code-L2MAC
7.384
28.417
49.45
70.483
Oct 2, 2023
Feature Completeness
Error Rate
Lines of Code (LOC)
Test Pass Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Feature Completeness
Error Rate
Lines of Code (LOC)
Test Pass Count
Code-L2MAC
2023.10
88.4
0
774
25.8
AutoGPT
Backbone=GPT-4
2023.10
59.4
0
374
18.8
Self-Refine
2023.10
23.1
1.85
220
3.08
CodeT
2023.10
14.2
0.211
111
1.42
GPT4
2023.10
11
0.346
127
1.2
Reflexion
2023.10
10.5
0
91.6
3.32
Feedback
Search any
task
Search any
task