Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

About

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung• 2024

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learningantmaze medium-play
Score84.8
35
Offline Reinforcement LearningD4RL Adroit pen (human)
Normalized Return83.9
32
Offline Reinforcement LearningD4RL Adroit pen (cloned)
Normalized Return66.5
32
Offline Reinforcement LearningMuJoCo hopper D4RL (medium-replay)
Normalized Return100.4
26
Offline Reinforcement LearningMuJoCo walker2d-medium D4RL
Normalized Return88.2
20
Offline Reinforcement LearningMuJoCo halfcheetah-medium-replay D4RL
Normalized Return54.1
20
Offline Reinforcement LearningMuJoCo walker2d medium-replay D4RL
Normalized Return94.1
20
Offline Reinforcement LearningMuJoCo halfcheetah-medium D4RL
Normalized Return59
20
Offline Reinforcement LearningMuJoCo walker2d medium-expert D4RL
Normalized Return116.6
18
Offline Reinforcement LearningMuJoCo halfcheetah-medium-expert D4RL
Normalized Return93.3
18
Showing 10 of 17 rows

Other info

Code

Follow for update