Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

GTA: Generative Trajectory Augmentation with Guidance for Offline Reinforcement Learning

About

Offline Reinforcement Learning (Offline RL) presents challenges of learning effective decision-making policies from static datasets without any online interactions. Data augmentation techniques, such as noise injection and data synthesizing, aim to improve Q-function approximation by smoothing the learned state-action region. However, these methods often fall short of directly improving the quality of offline datasets, leading to suboptimal results. In response, we introduce GTA, Generative Trajectory Augmentation, a novel generative data augmentation approach designed to enrich offline data by augmenting trajectories to be both high-rewarding and dynamically plausible. GTA applies a diffusion model within the data augmentation framework. GTA partially noises original trajectories and then denoises them with classifier-free guidance via conditioning on amplified return value. Our results show that GTA, as a general data augmentation strategy, enhances the performance of widely used offline RL algorithms across various tasks with unique challenges. Furthermore, we conduct a quality analysis of data augmented by GTA and demonstrate that GTA improves the quality of the data. Our code is available at https://github.com/Jaewoopudding/GTA

Jaewoo Lee, Sujin Yun, Taeyoung Yun, Jinkyoo Park• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert--
117
Offline Reinforcement LearningD4RL hopper-medium-expert--
115
Offline Reinforcement LearningD4RL walker2d-medium-expert--
86
Offline Reinforcement LearningD4RL halfcheetah v2 (medium-replay)--
58
Offline Reinforcement LearningD4RL Medium Walker2d--
58
Offline Reinforcement LearningD4RL walker2d medium-replay--
45
Offline Reinforcement LearningD4RL Locomotion medium, medium-replay, medium-expert v2
Score (HalfCheetah, Medium)63.76
34
Offline Reinforcement LearningD4RL Maze2D--
15
Dexterous Hand ControlAdroit
Overall Avg Success Rate42.73
13
Offline Reinforcement LearningVD4RL Cheetah-run pixel-based (medium-replay)
Normalized Score38.1
8
Showing 10 of 20 rows

Other info

Code

Follow for update