Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sequential task management and state maintenance on Lifelong AgentBench
Loading...
100
Accuracy
GA
68.8
76.9
85
93.1
Apr 18, 2026
Accuracy
Input Tokens
Output Tokens
Total Tokens
Efficiency Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Input Tokens
Output Tokens
Total Tokens
Efficiency Score
GA
Model=Claude Sonnet 4.6
2026.04
100
222
20
241
4.15
GA
Model=Minimax M2.7
2026.04
90
400
23
423
2.12
Claude Code
Model=Claude Sonnet 4.6
2026.04
75
800
14
814
0.92
OpenClaw
Model=Claude Sonnet 4.6
2026.04
70
1.43
21
1.45
0.48
OpenClaw
Model=Minimax M2.7
2026.04
70
1.2
17
1.22
0.57
Feedback
Search any
task
Search any
task