Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Text-based agent interaction on TextWorld Treasure
Loading...
81
Accuracy
Agent-BRACE
65.4
69.45
73.5
77.55
May 12, 2026
Accuracy
Average Steps
Updated 21d ago
Evaluation Results
Method
Method
Links
Accuracy
Average Steps
Agent-BRACE
Backbone=Qwen3-4B-Inst...
2026.05
81
29.9
ReAct (RL)
Backbone=Qwen3-4B-Inst...
2026.05
74
16.5
Direct-Action (RL)
Backbone=Qwen3-4B-Inst...
2026.05
72.5
28.1
PABU
Backbone=Qwen3-4B-Inst...
2026.05
70.7
37.8
ReAct
Backbone=Qwen3-4B-Inst...
2026.05
69.5
10
Base Model
Backbone=Qwen3-4B-Inst...
2026.05
66
29.9
Feedback
Search any
task
Search any
task