Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on RepoBench-P Python, XF-Random
Loading...
64.5
Execution Match (EM)
Ours
42.244
48.022
53.8
59.578
May 18, 2026
Execution Match (EM)
Execution Success (ES)
Time to First Byte (TTFT)
End-to-End Time (E2E)
Updated 15d ago
Evaluation Results
Method
Method
Links
Execution Match (EM)
Execution Success (ES)
Time to First Byte (TTFT)
End-to-End Time (E2E)
Ours
Backbone=Llama-3.1-8B
2026.05
64.5
79.2
118
3.8
Repoformer
Backbone=Llama-3.1-8B
2026.05
64.1
78.8
245
5.6
RepoHyper
Backbone=Llama-3.1-8B
2026.05
63.8
78.5
268
5.9
RepoCoder
Backbone=Llama-3.1-8B
2026.05
60.2
76.8
285
6.2
Sync-RAG
Backbone=Llama-3.1-8B
2026.05
58.6
75.4
312
6.8
No-RAG
Backbone=Llama-3.1-8B
2026.05
43.1
66.8
45
1.8
Feedback
Search any
task
Search any
task