Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
One-step next-observation prediction on Maze (test)
Loading...
98
Token F1
PatchWorld-Residual
72
78.75
85.5
92.25
May 29, 2026
Token F1
BLEU-4
Updated 2d ago
Evaluation Results
Method
Method
Links
Token F1
BLEU-4
PatchWorld-Residual
Backbone=DeepSeek-V4-F...
2026.05
98
93
Word2World
Backbone=Qwen3.5-4B, L...
2026.05
97
89
PatchWorld-Residual
Backbone=Qwen3-Coder-4...
2026.05
97
91
PatchWorld-Simple
Backbone=DeepSeek-V4-F...
2026.05
90
80
WorldCoder
Backbone=Mimo-v2.5, LL...
2026.05
88
78
PatchWorld-Simple
Backbone=Mimo-v2.5, LL...
2026.05
88
75
PatchWorld-Residual
Backbone=Mimo-v2.5, LL...
2026.05
87
82
PoE-World
Backbone=Mimo-v2.5, LL...
2026.05
86
78
LLM-Direct
Backbone=Qwen3-Coder-4...
2026.05
83
75
LLM-Direct
Backbone=Mimo-v2.5, LL...
2026.05
83
75
LLM-Direct
Backbone=DeepSeek-V4-F...
2026.05
83
73
WorldCoder
Backbone=Qwen3-Coder-4...
2026.05
83
76
PoE-World
Backbone=Qwen3-Coder-4...
2026.05
83
77
PatchWorld-Simple
Backbone=Qwen3-Coder-4...
2026.05
80
69
PoE-World
Backbone=DeepSeek-V4-F...
2026.05
76
61
WorldCoder
Backbone=DeepSeek-V4-F...
2026.05
73
61
Feedback
Search any
task
Search any
task