Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

About

Behavioral Foundation Models (BFMs) produce agents with the capability to adapt to any unknown reward or task. These methods, however, are only able to produce near-optimal policies for the reward functions that are in the span of some pre-existing state features, making the choice of state features crucial to the expressivity of the BFM. As a result, BFMs are trained using a variety of complex objectives and require sufficient dataset coverage, to train task-useful spanning features. In this work, we examine the question: are these complex representation learning objectives necessary for zero-shot RL? Specifically, we revisit the objective of self-supervised next-state prediction in latent space for state feature learning, but observe that such an objective alone is prone to increasing state-feature similarity, and subsequently reducing span. We propose an approach, Regularized Latent Dynamics Prediction (RLDP), that adds a simple orthogonality regularization to maintain feature diversity and can match or surpass state-of-the-art complex representation learning methods for zero-shot RL. Furthermore, we empirically show that prior approaches perform poorly in low-coverage scenarios where RLDP still succeeds.

Pranaya Jajoo, Harshit Sikchi, Siddhant Agarwal, Amy Zhang, Scott Niekum, Martha White• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learninghalfcheetah medium v2
Average Score49.08
27
Offline Reinforcement Learningwalker2d medium v2
Normalized Score83.83
18
Offline Reinforcement Learninghalfcheetah medium-expert v2
Normalized Score86.03
18
Offline Reinforcement Learninghopper medium v2--
14
Offline Reinforcement LearningWalker2d Medium-Expert v2
Average Score103.9
7
Offline Reinforcement LearningHopper Medium-Expert v2
Average Score77.21
7
Offline Reinforcement LearningDeepMind Control Suite Walker (test)
Stand Score877.7
5
Offline Reinforcement LearningDeepMind Control Suite Cheetah (test)
Run Score236.3
5
Offline Reinforcement LearningDeepMind Control Suite Quadruped (test)
Stand Score794.9
5
Offline Reinforcement LearningPointmass DeepMind Control Suite (test)
Performance (Top Left)890.4
5
Showing 10 of 14 rows

Other info

Follow for update