Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Coding on SWE-bench Lite (test)
Loading...
25.83
Accuracy
MAS-ZERO
0.7036
7.2268
13.75
20.2732
May 21, 2025
Accuracy
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
MAS-ZERO
LLM Backbone=GPT-4o
2025.05
25.83
MAS-ZERO
LLM Backbone=Llama3.3
2025.05
16.74
AFlow
LLM Backbone=GPT-4o
2025.05
16.25
Debate
LLM Backbone=GPT-4o
2025.05
12.5
Self-Refine
LLM Backbone=GPT-4o
2025.05
11.67
MaAS
LLM Backbone=GPT-4o
2025.05
10
CoT
LLM Backbone=GPT-4o
2025.05
9.17
Debate
LLM Backbone=Llama3.3
2025.05
6.67
AFlow
LLM Backbone=Llama3.3
2025.05
6.67
MaAS
LLM Backbone=Llama3.3
2025.05
5
CoT
LLM Backbone=Llama3.3
2025.05
2.92
Self-Refine
LLM Backbone=Llama3.3
2025.05
1.67
Feedback
Search any
task
Search any
task