Latent Particle World Models: Self-supervised Object-centric Stochastic Dynamics Modeling

About

We introduce Latent Particle World Model (LPWM), a self-supervised object-centric world model scaled to real-world multi-object datasets and applicable in decision-making. LPWM autonomously discovers keypoints, bounding boxes, and object masks directly from video data, enabling it to learn rich scene decompositions without supervision. Our architecture is trained end-to-end purely from videos and supports flexible conditioning on actions, language, and image goals. LPWM models stochastic particle dynamics via a novel latent action module and achieves state-of-the-art results on diverse real-world and synthetic datasets. Beyond stochastic video modeling, LPWM is readily applicable to decision-making, including goal-conditioned imitation learning, as we demonstrate in the paper. Code, data, pre-trained models and video rollouts are available: https://taldatech.github.io/lpwm-web

Tal Daniel, Carl Qi, Dan Haramati, Amir Zadeh, Chuan Li, Aviv Tamar, Deepak Pathak, David Held• 2026

Related benchmarks

Task	Dataset	Result
Video Prediction	BAIR 64x64 (test)	FVD89.4	27
Goal-conditioned imitation learning	OGBench-Scene (test)	Success Rate100	9
Goal-conditioned imitation learning	PandaPush (test)	Success Rate92.7	9
Goal-conditioned Robotic Manipulation	OGBench Visual Scene Play v0	Task 1 Success Rate100	7
Goal-conditioned multi-object manipulation	PandaPush 2 Cubes	Success Rate74	6
Goal-conditioned multi-object manipulation	PandaPush 1 Cube	Success Rate92.7	6
Goal-conditioned multi-object manipulation	PandaPush 3 Cubes	Success Rate62.1	6

Showing 7 of 7 rows

Other info

GitHub

Follow for update

@wizwand_team Discord