Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Lifelong Agent Bench OS Task
Loading...
78.8
Success Rate (Last Epoch)
MemRL
58.52
63.785
69.05
74.315
Jan 6, 2026
Jan 26, 2026
Feb 16, 2026
Mar 9, 2026
Mar 29, 2026
Apr 19, 2026
May 10, 2026
Success Rate (Last Epoch)
Cumulative Success Rate (CSR)
Updated 22d ago
Evaluation Results
Method
Method
Links
Success Rate (Last Epoch)
Cumulative Success Rate (CSR)
MemRL
Model=GPT-4o-mini
2026.01
78.8
80.4
SkillMAS
Evaluation Source=Inte...
2026.05
76.7
-
MemP
Model=GPT-4o-mini
2026.01
73.6
74.2
Traj-Bootstrap
Evaluation Source=Inte...
2026.05
70
-
RAG
Model=GPT-4o-mini
2026.01
69
70
CDMem
Evaluation Source=Inte...
2026.05
68
-
No Memory
Model=GPT-4o-mini
2026.01
67.4
-
Mem0
Model=GPT-4o-mini
2026.01
67
70.2
Self-RAG
Model=GPT-4o-mini
2026.01
64.6
73.2
ReAct
Evaluation Source=Inte...
2026.05
62
-
Direct LLM
Evaluation Source=Inte...
2026.05
59.3
-
Pass@10
Model=GPT-4o-mini
2026.01
-
75.6
Feedback
Search any
task
Search any
task