Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Agentic Planning Through Simulative Reasoning with World Models

About

What does it mean to plan? Current agentic systems, whether scaffolded workflows or end-to-end policies, rely on reactive decision-making: selecting the next action via a fixed procedure with at most undifferentiated adaptive computation (e.g., chain-of-thought) lacking explicit modeling of future outcomes. This limits generalizability, as each new task demands re-engineering rather than transfer of shared reasoning capacity. Humans, by contrast, plan by mentally simulating consequences of candidate actions within an internal world model, a capacity known as simulative reasoning (System II) that supports flexible, goal-directed behavior across diverse contexts. We argue that simulative reasoning through a world model provides a general-purpose planning mechanism for agentic systems, improving upon reactive policies (System I) by grounding decisions in predicted future states rather than pattern-matched responses. To verify this, we introduce SiRA (Simulative Reasoning Architecture), a goal-oriented architecture instantiating simulative reasoning using an LLM-based world model with natural-language belief states, while remaining model-agnostic. We evaluate across three qualitatively distinct task categories: constrained navigation, multi-hop information aggregation, and general instruction following, in a web-browser environment. Across all categories, simulative reasoning achieves up to 124% higher task completion rates than a matched reactive baseline, and increases constrained navigation success from 0% to 32.2% compared to a representative open-web agent. The persistent advantage across distinct task types suggests the benefit stems from generalizable counterfactual evaluation rather than task-specific tuning.

Mingkai Deng, Jinyu Hou, Zhiting Hu, Eric Xing• 2025

Related benchmarks

TaskDatasetResultRank
Embodied Task CompletionEB-Habitat
Avg Success Rate48.4
63
Embodied Instruction FollowingEB-ALFRED 1.0 (test)
Success Rate (Avg)45.6
20
Constrained navigationFlightQA
Accuracy32.2
5
General Instruction FollowingWebArena random 100-sample
Success Rate23
3
Multi-hop information aggregationFanOutQA
Accuracy29.8
3
Showing 5 of 5 rows

Other info

Follow for update