Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-based agent interaction on TextWorld Treasure (test)
Loading...
81.5
Accuracy
Agent-BRACE
4.54
24.52
44.5
64.48
May 12, 2026
Accuracy
Average Steps
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Steps
Agent-BRACE
Backbone=Qwen2.5-3B-In...
2026.05
81.5
32.1
Agent-BRACE
Backbone=Qwen3-4B-Inst...
2026.05
81
30
ReAct (RL)
Backbone=Qwen3-4B-Inst...
2026.05
74
16.5
PABU
Backbone=Qwen3-4B-Inst...
2026.05
73.5
37.2
PABU
Backbone=Qwen2.5-3B-In...
2026.05
72.5
34.4
Direct-Action (RL)
Backbone=Qwen3-4B-Inst...
2026.05
72.5
28
ReAct
Backbone=Qwen3-4B-Inst...
2026.05
69.5
10.3
Direct-Action (RL)
Backbone=Qwen2.5-3B-In...
2026.05
67.5
32.6
Base Model
Backbone=Qwen3-4B-Inst...
2026.05
65
30.3
MEM1
Backbone=Qwen3-4B-Inst...
2026.05
63.5
31.4
ReAct (RL)
Backbone=Qwen2.5-3B-In...
2026.05
55
32.7
ReAct
Backbone=Qwen2.5-3B-In...
2026.05
37
33.6
MEM1
Backbone=Qwen2.5-3B-In...
2026.05
30
47.7
Base Model
Backbone=Qwen2.5-3B-In...
2026.05
7.5
93.2
Feedback
Search any
task
Search any
task