Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-based agent interaction on TextWorld Cooking (test)
Loading...
75.5
Accuracy
Direct-Action (RL)
-0.42
19.29
39
58.71
May 12, 2026
Accuracy
Steps
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Steps
Direct-Action (RL)
Backbone=Qwen3-4B-Inst...
2026.05
75.5
31.9
Base Model
Backbone=Qwen3-4B-Inst...
2026.05
69.5
34.1
Agent-BRACE
Backbone=Qwen3-4B-Inst...
2026.05
69
44.6
Agent-BRACE
Backbone=Qwen2.5-3B-In...
2026.05
58.5
60.3
MEM1
Backbone=Qwen2.5-3B-In...
2026.05
52.5
48
Direct-Action (RL)
Backbone=Qwen2.5-3B-In...
2026.05
51.5
46.1
ReAct (RL)
Backbone=Qwen2.5-3B-In...
2026.05
34.5
44.4
PABU
Backbone=Qwen2.5-3B-In...
2026.05
33
73.1
PABU
Backbone=Qwen3-4B-Inst...
2026.05
32.5
75.6
ReAct
Backbone=Qwen2.5-3B-In...
2026.05
27.5
38.4
ReAct
Backbone=Qwen3-4B-Inst...
2026.05
13.5
24.4
ReAct (RL)
Backbone=Qwen3-4B-Inst...
2026.05
13
40.6
MEM1
Backbone=Qwen3-4B-Inst...
2026.05
10
10
Base Model
Backbone=Qwen2.5-3B-In...
2026.05
2.5
98.1
Feedback
Search any
task
Search any
task