Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Simulus: Combining Improvements in Sample-Efficient World Model Agents

About

World models (WMs) represent the frontier of sample-efficient reinforcement learning, but their complexity leaves many promising improvements unrealized due to the significant expertise and effort required to identify and integrate them. Inspired by Rainbow, which showed that individually known improvements to DQN complement each other and can be effectively combined, we take on this challenge and ask whether the same principle applies to world model agents. We introduce Simulus, a modular token-based WM agent that integrates: (1) a flexible tokenization framework supporting arbitrary combinations of observation and action modalities; (2) intrinsic motivation for epistemic uncertainty reduction; (3) prioritized world model replay; and (4) regression-as-classification for reward and return prediction. Simulus achieves state-of-the-art sample efficiency for planning-free WMs across three diverse benchmarks: visual Atari 100K, continuous-control DMC Proprioception 500K, and symbolic Craftax-1M. Notably, intrinsic motivation proves beneficial even under the tight interaction budgets of sample-efficient RL, despite the risk of wasting scarce interactions on task-irrelevant experience. Ablation studies reveal that each component contributes individually, and their combination yields synergistic gains. Our code and model weights are publicly available at https://github.com/leor-c/Simulus.

Lior Cohen, Kaixin Wang, Bingyi Kang, Uri Gadot, Shie Mannor• 2025

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 100k
Alien Score687.2
41
Reinforcement LearningAtari 100K (test)
Mean Score1.609
21
Reinforcement LearningCraftax 1M environment interactions latest
Return (%)6.59
3
Showing 3 of 3 rows

Other info

Follow for update