Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-based Task Completion on Jericho
Loading...
3.37
Mean Normalised Score
ReAct
0.5516
1.2833
2.015
2.7467
Apr 19, 2026
Mean Normalised Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean Normalised Score
ReAct
Model=Mistral Small 22B
2026.04
3.37
DORA
Model=Mistral Small 22B
2026.04
2.92
ReAct
Model=Llama-3.1 8B
2026.04
2.88
Chain of Thought
Model=Llama-3.1 8B
2026.04
2.22
Zero-shot
Model=Llama-3.1 8B
2026.04
2.21
DORA
Model=Llama-3.1 8B
2026.04
2.17
Tree of Thought
Model=Llama-3.1 8B
2026.04
2.05
ReAct
Model=Qwen-2.5 7B
2026.04
1.87
Prompt Explore
Model=Mistral Small 22B
2026.04
1.7
Prompt Explore
Model=Llama-3.1 8B
2026.04
1.6
Chain of Thought
Model=Qwen-2.5 7B
2026.04
1.58
Chain of Thought
Model=Mistral Small 22B
2026.04
1.43
DORA
Model=Qwen-2.5 7B
2026.04
1.42
Tree of Thought
Model=Qwen-2.5 7B
2026.04
1.4
Zero-shot
Model=Mistral Small 22B
2026.04
1.35
Prompt Explore
Model=Qwen-2.5 7B
2026.04
1.27
Tree of Thought
Model=Mistral Small 22B
2026.04
1.27
Zero-shot
Model=Qwen-2.5 7B
2026.04
0.66
Feedback
Search any
task
Search any
task