Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Complex Multi-step Reasoning on HLE
Loading...
60
Success Rate
EvoMAS-7
8
21.5
35
48.5
May 9, 2026
Success Rate
Average Success Rate
Updated 22d ago
Evaluation Results
Method
Method
Links
Success Rate
Average Success Rate
EvoMAS-7
Method Category (Singl...
2026.05
60
65
G-Designer
Method Category (Singl...
2026.05
50
34.8
MaAS
Method Category (Singl...
2026.05
50
34.8
EvoMAS-4
Method Category (Singl...
2026.05
40
45
GPT-4o
Method Category (Singl...
2026.05
30
29.8
GPTSwarm
Method Category (Singl...
2026.05
20
18.3
AFlow
Method Category (Singl...
2026.05
20
22.1
GPT-4o-mini
Method Category (Singl...
2026.05
10
22.3
Feedback
Search any
task
Search any
task