Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Interactive Fiction on TextWorld
Loading...
98.7
Success Rate (%)
LLM-only
63.236
72.443
81.65
90.857
May 15, 2026
Success Rate (%)
LLM Score (%)
Updated 15d ago
Evaluation Results
Method
Method
Links
Success Rate (%)
LLM Score (%)
LLM-only
Backbones=Averaged ove...
2026.05
98.7
100
Oracle Router
Backbones=Averaged ove...
2026.05
98.6
35.4
Heuristic Router
Backbones=Averaged ove...
2026.05
98.4
99.9
R2V
Backbones=Averaged ove...
2026.05
98.2
41.7
SLM-only
Backbones=Averaged ove...
2026.05
64.6
0
Entropy Router
Backbones=Averaged ove...
2026.05
64.6
0
Feedback
Search any
task
Search any
task