Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
One-step next-observation prediction on SciWorld (test)
Loading...
96
Token F1
Word2World
18
38.25
58.5
78.75
May 29, 2026
Token F1
BLEU-4
Updated 2d ago
Evaluation Results
Method
Method
Links
Token F1
BLEU-4
Word2World
Backbone=Qwen3.5-4B, L...
2026.05
96
95
PatchWorld-Residual
Backbone=Mimo-v2.5, LL...
2026.05
69
56
PatchWorld-Simple
Backbone=Qwen3-Coder-4...
2026.05
57
39
PatchWorld-Residual
Backbone=Qwen3-Coder-4...
2026.05
56
48
PatchWorld-Residual
Backbone=DeepSeek-V4-F...
2026.05
56
46
LLM-Direct
Backbone=DeepSeek-V4-F...
2026.05
52
34
LLM-Direct
Backbone=Qwen3-Coder-4...
2026.05
48
30
PatchWorld-Simple
Backbone=Mimo-v2.5, LL...
2026.05
48
30
LLM-Direct
Backbone=Mimo-v2.5, LL...
2026.05
45
25
WorldCoder
Backbone=DeepSeek-V4-F...
2026.05
41
33
PoE-World
Backbone=Mimo-v2.5, LL...
2026.05
41
36
WorldCoder
Backbone=Qwen3-Coder-4...
2026.05
40
36
PoE-World
Backbone=Qwen3-Coder-4...
2026.05
39
34
WorldCoder
Backbone=Mimo-v2.5, LL...
2026.05
37
33
PatchWorld-Simple
Backbone=DeepSeek-V4-F...
2026.05
22
4
PoE-World
Backbone=DeepSeek-V4-F...
2026.05
21
21
Feedback
Search any
task
Search any
task