Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Physical Reasoning on WorldModelBench

41.8General Score

Qwen9B-base

-0.63210.38421.432.416May 11, 2026
Updated 22d ago

Evaluation Results

MethodLinks
2026.05
41.841.641.747.8
2026.05
37.346.942.148.3
2026.05
28.23531.637.7
2026.05
25.234.429.834.3
2026.05
21.340.931.135.7
2026.05
17.315.916.6-6.3
2026.05
17.217.417.3-6.5
2026.05
16.21013.114.5
2026.05
15.53.3-1.1