Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PoE-World: Compositional World Modeling with Products of Programmatic Experts

About

Learning how the world works is central to building AI agents that can adapt to complex environments. Traditional world models based on deep learning demand vast amounts of training data, and do not flexibly update their knowledge from sparse observations. Recent advances in program synthesis using Large Language Models (LLMs) give an alternate approach which learns world models represented as source code, supporting strong generalization from little data. To date, application of program-structured world models remains limited to natural language and grid-world domains. We introduce a novel program synthesis method for effectively modeling complex, non-gridworld domains by representing a world model as an exponentially-weighted product of programmatic experts (PoE-World) synthesized by LLMs. We show that this approach can learn complex, stochastic world models from just a few observations. We evaluate the learned world models by embedding them in a model-based planning agent, demonstrating efficient performance and generalization to unseen levels on Atari's Pong and Montezuma's Revenge. We release our code and display the learned world models and videos of the agent's gameplay at https://topwasu.github.io/poe-world.

Wasu Top Piriyakulkij, Yichao Liang, Hao Tang, Adrian Weller, Marta Kryven, Kevin Ellis• 2025

Related benchmarks

TaskDatasetResultRank
One-step next-observation predictionWebShop (test)
Token F160
16
One-step next-observation predictionALFWorld (test)
Token F162
16
One-step next-observation predictionAgentGym Unweighted Average (test)
Token F165
16
One-step next-observation predictionBabyAI (test)
Token F178
16
One-step next-observation predictionTextCraft (test)
Token F188
16
One-step next-observation predictionMaze (test)
Token F186
16
One-step next-observation predictionWordle (test)
Token F10.55
16
One-step next-observation predictionSciWorld (test)
Token F141
16
PlanningAlfWorld, BabyAI, Maze, SciWorld, TextCraft, WebShop, Wordle (held-out)
AlfWorld Success Rate3.5
7
Multi-step rollout prediction7 Environments (AlfWorld, BabyAI, Maze, SciWorld, TextCraft, WebShop, Wordle) (held-out episodes)
Token F1 (t=1)63
5
Showing 10 of 10 rows

Other info

Follow for update