Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-based agent interaction on TextWorld Cooking
Loading...
76
Accuracy
Direct-Action (RL)
10.48
27.49
44.5
61.51
May 12, 2026
Accuracy
Steps
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Steps
Direct-Action (RL)
Backbone=Qwen3-4B-Inst...
2026.05
76
31.5
Base Model
Backbone=Qwen3-4B-Inst...
2026.05
69
33.9
Agent-BRACE
Backbone=Qwen3-4B-Inst...
2026.05
68.7
44.7
PABU
Backbone=Qwen3-4B-Inst...
2026.05
32.5
72.1
ReAct
Backbone=Qwen3-4B-Inst...
2026.05
13.2
24.4
ReAct (RL)
Backbone=Qwen3-4B-Inst...
2026.05
13
40.6
Feedback
Search any
task
Search any
task