Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

About

Language agents based on large language models (LLMs) have demonstrated great promise in automating web-based tasks. Recent work has shown that incorporating advanced planning algorithms, e.g., tree search, is advantageous over reactive planning for web agents. However, unlike simulated sandbox environments, real-world environments such as the web are rife with irreversible actions. This undermines the feasibility of backtracking, a cornerstone of (tree) search. Overly relying on test-time search also hurts efficiency. We advocate model-based planning for web agents that employs a world model to simulate and deliberate over the outcome of each candidate action before committing to one. We systematically explore this paradigm by (1) Proposing a model-based planning framework, WebDreamer, which employs LLMs to serve as both world models and value functions; (2) Training specialized LLMs as world models with a scalable data synthesis pipeline. Empirical results demonstrate that WebDreamer achieves substantial performance improvements over reactive baselines. It is competitive, while being 4-5 times more efficient, with tree search in sandbox environments (VisualWebArena) and also works effectively on real-world websites (Online-Mind2Web and Mind2Web-Live). Furthermore, our trained world model, Dreamer-7B, performs comparable to GPT-4o, highlighting the potential of specialized world models for efficient and effective planning in complex web environments.

Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, Huan Sun, Yu Su• 2024

Related benchmarks

Task	Dataset	Result
Web navigation and task completion	WebArena (test)	Average Task Completion31.86	137
Computer Use	OSWorld	OS Success Rate46.35	45
Web Navigation Task Success	MIND2WEB ONLINE (test)	Task Success Rate (Overall)35	41
Web task automation	VisualWebArena full	SR22.7	21
End-to-end task execution	OSWorld (test)	Success Rate31.24	12
Web task automation	Mind2Web Online (full)	Success Rate (SR)14.7	3

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord