Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-based Task Completion on AlfWorld
Loading...
2.78
Mean Normalised Score
ReAct
-0.1112
0.6394
1.39
2.1406
Apr 19, 2026
Mean Normalised Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean Normalised Score
ReAct
Model=Qwen-2.5 7B
2026.04
2.78
DORA
Model=Qwen-2.5 7B
2026.04
2.78
DORA
Model=Llama-3.1 8B
2026.04
2.78
DORA
Model=Mistral Small 22B
2026.04
2.78
Zero-shot
Model=Qwen-2.5 7B
2026.04
0
Chain of Thought
Model=Qwen-2.5 7B
2026.04
0
Tree of Thought
Model=Qwen-2.5 7B
2026.04
0
Prompt Explore
Model=Qwen-2.5 7B
2026.04
0
Zero-shot
Model=Llama-3.1 8B
2026.04
0
Chain of Thought
Model=Llama-3.1 8B
2026.04
0
Tree of Thought
Model=Llama-3.1 8B
2026.04
0
Prompt Explore
Model=Llama-3.1 8B
2026.04
0
ReAct
Model=Llama-3.1 8B
2026.04
0
Zero-shot
Model=Mistral Small 22B
2026.04
0
Chain of Thought
Model=Mistral Small 22B
2026.04
0
Tree of Thought
Model=Mistral Small 22B
2026.04
0
Prompt Explore
Model=Mistral Small 22B
2026.04
0
ReAct
Model=Mistral Small 22B
2026.04
0
Feedback
Search any
task
Search any
task