Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Exploration on ALFWorld
Loading...
94.9
Success Rate (Last Epoch)
MemRL
61.204
69.952
78.7
87.448
Jan 6, 2026
Jan 21, 2026
Feb 6, 2026
Feb 22, 2026
Mar 9, 2026
Mar 25, 2026
Apr 10, 2026
Success Rate (Last Epoch)
Cumulative Success Rate (CSR)
Updated 6d ago
Evaluation Results
Method
Method
Links
Success Rate (Last Epoch)
Cumulative Success Rate (CSR)
MemRL
Model=GPT-5-mini
2026.01
94.9
0.981
Self-RAG
Model=GPT-5-mini
2026.01
90.7
0.962
Mem0
Model=GPT-5-mini
2026.01
89.4
0.969
E3-TIR
Backbone=Qwen2.5-3B-In...
2026.04
89
-
RAG
Model=GPT-5-mini
2026.01
88.7
0.93
MemP
Model=GPT-5-mini
2026.01
88.5
0.919
Zero-RL
Backbone=Qwen2.5-3B-In...
2026.04
86.5
-
SFT-then-RL
Backbone=Qwen2.5-3B-In...
2026.04
81.5
-
No Memory
Model=GPT-5-mini
2026.01
77.7
-
Only SFT
Backbone=Qwen2.5-3B-In...
2026.04
62.5
-
Pass@10
Model=GPT-5-mini
2026.01
-
0.928
Feedback
Search any
task
Search any
task