Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

State Diversity Matters in Offline Behavior Distillation

About

Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superior synthetic dataset. Through an empirical analysis of policy performance under varying levels of training loss, we show that datasets with greater state diversity outperforms those with higher state quality when training loss is substantial, as is often the case in OBD, whereas the relationship reverses under minimal loss, which contributes to the misalignment. By associating state quality and diversity in reducing pivotal and surrounding error, respectively, our theoretical analysis establishes that surrounding error plays a more crucial role in policy performance when pivotal error is large, thereby highlighting the importance of state diversity in OBD scenario. Furthermore, we propose a novel yet simple algorithm, state density weighted (SDW) OBD, which emphasizes state diversity by weighting the distillation objective using the reciprocal of state density, thereby distilling a more diverse state information into synthetic data. Extensive experiments across multiple D4RL datasets confirm that SDW significantly enhances OBD performance when the original dataset exhibits limited state diversity.

Shiye Lei, Zhihao Cheng, Dacheng Tao• 2025

Related benchmarks

TaskDatasetResultRank
Offline Behavior DistillationD4RL Halfcheetach (medium)
Normalized Return39.5
8
Offline Behavior DistillationD4RL Hopper medium
Normalized Return38.4
8
Offline Behavior DistillationD4RL hopper-medium-expert
Normalized Return42.6
8
Offline Behavior DistillationD4RL Walker2d medium
Normalized Return42.5
8
Offline Behavior DistillationD4RL walker2d-medium-expert
Normalized Return44.6
8
Offline Behavior DistillationD4RL Halfcheetach (medium-expert)
Normalized Return25
8
Showing 6 of 6 rows

Other info

Follow for update