Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-horizon task execution on Long-horizon complex tasks (test)
Loading...
100
Success Rate
Claude Code
79.2
84.6
90
95.4
Apr 18, 2026
Success Rate
Total Tokens Consumed
Execution Time (s)
API Requests
Tool Calls
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Total Tokens Consumed
Execution Time (s)
API Requests
Tool Calls
Claude Code
#Tasks=5
2026.04
100
537,413
320.8
32.6
22.6
GA
#Tasks=5
2026.04
100
188,829
220.8
11
12.8
OpenClaw
#Tasks=5
2026.04
80
633,101
183.1
15
16.6
Feedback
Search any
task
Search any
task