Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Aggregate Language and Logic Tasks on HumanEval++, MATH, MMLU-Redux
Loading...
94.61
Average Accuracy
HieraMAS
82.6812
85.7781
88.875
91.9719
Feb 23, 2026
Average Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Accuracy
HieraMAS
Multi=true, Topo=true,...
2026.02
94.61
Full-Graph
Multi=true, Topo=false...
2026.02
92.78
AFlow
Multi=true, Topo=true,...
2026.02
92.69
Self-Consistency+CoT
Multi=true, Topo=false...
2026.02
91.6
LLM-Debate
Multi=true, Topo=false...
2026.02
91.45
MASRouter
Multi=true, Topo=true,...
2026.02
90.89
Random-Graph
Multi=true, Topo=false...
2026.02
90.85
GDesigner
Multi=true, Topo=true,...
2026.02
90.68
CoT
Multi=false, Topo=fals...
2026.02
89.81
Self-Consistency
Multi=true, Topo=false...
2026.02
89.77
Base
Multi=false, Topo=fals...
2026.02
83.14
Feedback
Search any
task
Search any
task