Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Instruction-following and procedural reasoning on SOP-Bench
Loading...
100
Accuracy
GA
84.4
88.45
92.5
96.55
Apr 18, 2026
Accuracy
Input Tokens (k/M)
Output Tokens (k)
Total Tokens (k/M)
Efficiency Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Input Tokens (k/M)
Output Tokens (k)
Total Tokens (k/M)
Efficiency Score
GA
Model=Claude Sonnet 4.6
2026.04
100
2.02
53
2.08
0.48
OpenClaw
Model=Claude Sonnet 4.6
2026.04
100
2.6
40
2.64
0.38
OpenClaw
Model=Minimax M2.7
2026.04
95
2.91
46
2.96
0.32
GA
Model=Minimax M2.7
2026.04
90
893
32
924
0.97
Claude Code
Model=Claude Sonnet 4.6
2026.04
85
1.23
23
1.25
0.68
Feedback
Search any
task
Search any
task