Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
One-step next-observation prediction on TextCraft (test)
Loading...
95
Token F1
PatchWorld-Residual
25.32
43.41
61.5
79.59
May 29, 2026
Token F1
BLEU-4
Updated 2d ago
Evaluation Results
Method
Method
Links
Token F1
BLEU-4
PatchWorld-Residual
Backbone=Mimo-v2.5, LL...
2026.05
95
68
Word2World
Backbone=Qwen3.5-4B, L...
2026.05
94
68
PatchWorld-Simple
Backbone=Mimo-v2.5, LL...
2026.05
93
67
PatchWorld-Residual
Backbone=DeepSeek-V4-F...
2026.05
91
66
WorldCoder
Backbone=Qwen3-Coder-4...
2026.05
88
61
WorldCoder
Backbone=Mimo-v2.5, LL...
2026.05
88
61
PoE-World
Backbone=Mimo-v2.5, LL...
2026.05
88
61
PatchWorld-Simple
Backbone=DeepSeek-V4-F...
2026.05
84
61
LLM-Direct
Backbone=DeepSeek-V4-F...
2026.05
78
51
PoE-World
Backbone=Qwen3-Coder-4...
2026.05
78
55
LLM-Direct
Backbone=Qwen3-Coder-4...
2026.05
76
51
LLM-Direct
Backbone=Mimo-v2.5, LL...
2026.05
76
50
PatchWorld-Residual
Backbone=Qwen3-Coder-4...
2026.05
71
48
WorldCoder
Backbone=DeepSeek-V4-F...
2026.05
60
35
PoE-World
Backbone=DeepSeek-V4-F...
2026.05
50
41
PatchWorld-Simple
Backbone=Qwen3-Coder-4...
2026.05
28
10
Feedback
Search any
task
Search any
task