Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Program Synthesis on HumanEval (test)
Loading...
92.37
Accuracy
SC-MAS
80.2852
83.4226
86.56
89.6974
Jan 14, 2026
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
SC-MAS
LLM backbone=LLM Pool,...
2026.01
92.37
MasRouter
LLM backbone=LLM Pool,...
2026.01
90.62
AFlow
LLM backbone=gpt-4o-mi...
2026.01
90.06
RouterDC
LLM backbone=LLM Pool,...
2026.01
87.75
FrugalGPT
LLM backbone=LLM Pool,...
2026.01
87.31
AgentPrune
LLM backbone=gpt-4o-mi...
2026.01
86.8
Vanilla
LLM backbone=claude-3....
2026.01
86.33
PromptLLM
LLM backbone=LLM Pool,...
2026.01
86.33
GPTSwarm
LLM backbone=gpt-4o-mi...
2026.01
86.28
Vanilla
LLM backbone=gpt-4o-mi...
2026.01
85.71
AFlow
LLM backbone=gemini-1....
2026.01
85.69
RouteLLM
LLM backbone=LLM Pool,...
2026.01
83.85
Vanilla
LLM backbone=gemini-1....
2026.01
82.61
AgentPrune
LLM backbone=gemini-1....
2026.01
82.55
GPTSwarm
LLM backbone=gemini-1....
2026.01
82.36
Vanilla
LLM backbone=llama-3.1...
2026.01
80.75
Feedback
Search any
task
Search any
task