Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Performance Asymmetry in Model-Based Reinforcement Learning

About

Recently, Model-Based Reinforcement Learning (MBRL) have achieved super-human level performance on the Atari100k benchmark on average. However, we discover that conventional aggregates mask a major problem, Performance Asymmetry: MBRL agents dramatically outperform humans in certain tasks (Agent-Optimal tasks) while drastically underperform humans in other tasks (Human-Optimal tasks). Indeed, despite achieving SOTA in the overall mean Human-Normalized Scores (HNS), the SOTA agent scored the worst among baselines on Human-Optimal tasks, with a striking 21X performance gap between the Human-Optimal and Agent-Optimal subsets. To address this, we partition Atari100k evenly into Human-Optimal and Agent-Optimal subsets, and introduce a more balanced aggregate, Sym-HNS. Furthermore, we trace the striking Performance Asymmetry in the SOTA pixel diffusion world model to the curse of dimensionality and its prowess on high visual detail tasks (e.g. Breakout). To this end, we propose a novel latent end-to-end Joint Embedding DIffusion (JEDI) world model that achieves SOTA results in Sym-HNS, Human-Optimal tasks, and Breakout -- thus reversing the worsening Performance Asymmetry trend while improving computational efficiency and remaining competitive on the full Atari100k.

Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu• 2025

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningAtari 100K (test)
Mean Score2.425
21
Reinforcement LearningAtari 100k--
18
Reinforcement LearningAtari 100k steps (overall)
Game Score: Boxing91.6
9
Reinforcement LearningAtari Breakout 100k (test)
HNS535
6
Reinforcement LearningAtari Assault 100k (test)
HNS2.26
6
Showing 5 of 5 rows

Other info

Follow for update