Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
OS Interaction on LLAB (train)
Loading...
80.34
Success Rate (SR)
MemQ
61.6096
66.4723
71.335
76.1977
May 8, 2026
Success Rate (SR)
Cumulative Success Rate (CSR)
Updated 22d ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Cumulative Success Rate (CSR)
MemQ
Model=4o-mini
2026.05
80.34
82.27
MemRL
Model=4o-mini
2026.05
77.13
80.8
MemP
Model=4o-mini
2026.05
76.07
79.27
RAG
Model=4o-mini
2026.05
65.2
68.47
No Mem.
Model=4o-mini
2026.05
63.6
-
Mem0
Model=4o-mini
2026.05
62.53
78.4
Self-RAG
Model=4o-mini
2026.05
62.33
75.27
Feedback
Search any
task
Search any
task