Share your thoughts, 1 month free Claude Pro on usSee more

Instruction-following and procedural reasoning on SOP-Bench

100Accuracy

GA

Updated 3mo ago

Evaluation Results

Method	Links
GA 2026.04		100	2.02	53	2.08	0.48
OpenClaw 2026.04		100	2.6	40	2.64	0.38
OpenClaw 2026.04		95	2.91	46	2.96	0.32
GA 2026.04		90	893	32	924	0.97
Claude Code 2026.04		85	1.23	23	1.25	0.68