Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-language Programming on Aider-Polyglot
Loading...
61.7
Score
GPT-o1
16.044
27.897
39.75
51.603
Sep 26, 2025
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
GPT-o1
Backbone=GPT-o1
2025.09
61.7
CRITIQUE-CODER
Backbone=Qwen3-8B, Thi...
2025.09
35.6
Qwen3-8B-RL
Backbone=Qwen3-8B, Thi...
2025.09
34.5
Baseline (Qwen3-8B)
Backbone=Qwen3-8B, Thi...
2025.09
28.4
CRITIQUE-CODER
Backbone=Qwen3-4B, Thi...
2025.09
24.4
Qwen3-4B-RL
Backbone=Qwen3-4B, Thi...
2025.09
23.6
Baseline (Qwen3-4B)
Backbone=Qwen3-4B, Thi...
2025.09
21.8
DeepSeek-R1-Distill-14B
Backbone=DeepSeek-R1-D...
2025.09
18.6
DeepCoder-14B
Backbone=DeepCoder-14B
2025.09
18.4
DeepSeek-V2.5-238B
Backbone=DeepSeek-V2.5...
2025.09
17.8
Feedback
Search any
task
Search any
task