Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
One-step next-observation prediction on AlfWorld (test)
Loading...
89
Token F1
Word2World
33.88
48.19
62.5
76.81
May 29, 2026
Token F1
BLEU-4
Updated 2d ago
Evaluation Results
Method
Method
Links
Token F1
BLEU-4
Word2World
Backbone=Qwen3.5-4B, L...
2026.05
89
66
PatchWorld-Residual
Backbone=Mimo-v2.5, LL...
2026.05
77
50
PatchWorld-Simple
Backbone=Mimo-v2.5, LL...
2026.05
73
47
PatchWorld-Residual
Backbone=Qwen3-Coder-4...
2026.05
70
47
WorldCoder
Backbone=Qwen3-Coder-4...
2026.05
63
42
PoE-World
Backbone=Qwen3-Coder-4...
2026.05
62
40
PoE-World
Backbone=Mimo-v2.5, LL...
2026.05
62
40
WorldCoder
Backbone=Mimo-v2.5, LL...
2026.05
59
35
PatchWorld-Residual
Backbone=DeepSeek-V4-F...
2026.05
57
29
LLM-Direct
Backbone=Mimo-v2.5, LL...
2026.05
55
33
LLM-Direct
Backbone=DeepSeek-V4-F...
2026.05
55
34
LLM-Direct
Backbone=Qwen3-Coder-4...
2026.05
53
35
PatchWorld-Simple
Backbone=DeepSeek-V4-F...
2026.05
48
21
WorldCoder
Backbone=DeepSeek-V4-F...
2026.05
40
21
PoE-World
Backbone=DeepSeek-V4-F...
2026.05
37
14
PatchWorld-Simple
Backbone=Qwen3-Coder-4...
2026.05
36
10
Feedback
Search any
task
Search any
task