Share your thoughts, 1 month free Claude Pro on usSee more

Long-horizon task execution on Long-horizon complex tasks (test)

100Success Rate

Claude Code

Updated 3mo ago

Evaluation Results

Method	Links
Claude Code 2026.04		100	537,413	320.8	32.6	22.6
GA 2026.04		100	188,829	220.8	11	12.8
OpenClaw 2026.04		80	633,101	183.1	15	16.6