Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Task Performance on AgentBoard ScienceWorld Naive Stream
Loading...
66.7
Success Rate (1-step)
MEMPROBE
58.588
60.694
62.8
64.906
Jun 1, 2026
Success Rate (1-step)
Precision Rate (1-step)
Success Rate (2-step)
Precision Rate (2-step)
Updated 1d ago
Evaluation Results
Method
Method
Links
Success Rate (1-step)
Precision Rate (1-step)
Success Rate (2-step)
Precision Rate (2-step)
MEMPROBE
2026.06
66.7
88.6
66.7
88.1
ReAct
2026.06
64.4
87.1
64.4
87.1
ReMem
2026.06
59.3
83.3
62.9
87.1
ExpRAG
2026.06
58.9
86
67.8
87.3
Feedback
Search any
task
Search any
task