Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

About

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learninghopper medium
Normalized Score96.4
68
Offline Reinforcement Learningwalker2d medium-replay
Normalized Score94.1
61
Offline Reinforcement Learningwalker2d medium
Normalized Score88.2
61
Offline Reinforcement Learninghopper medium-replay
Normalized Score100.4
55
Offline Reinforcement Learninghalfcheetah medium-replay
Normalized Score54.1
54
Offline Reinforcement Learninghalfcheetah medium
Normalized Score59
53
Offline Reinforcement LearningD4RL Adroit pen (human)
Normalized Return83.9
53
Offline Reinforcement LearningD4RL Adroit pen (cloned)
Normalized Return66.5
53
Offline Reinforcement Learningantmaze medium-play
Score84.8
44
Offline Reinforcement LearningWalker2d medium-expert
Normalized Score116.6
42
Showing 10 of 39 rows

Other info

Code

Follow for update