Adaptive $Q$-Aid for Conditional Supervised Learning in Offline Reinforcement Learning

About

Offline reinforcement learning (RL) has progressed with return-conditioned supervised learning (RCSL), but its lack of stitching ability remains a limitation. We introduce $Q$-Aided Conditional Supervised Learning (QCS), which effectively combines the stability of RCSL with the stitching capability of $Q$-functions. By analyzing $Q$-function over-generalization, which impairs stable stitching, QCS adaptively integrates $Q$-aid into RCSL's loss function based on trajectory return. Empirical results show that QCS significantly outperforms RCSL and value-based methods, consistently achieving or exceeding the maximum trajectory returns across diverse offline RL benchmarks.

Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung• 2024

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	hopper medium	Normalized Score96.4	68
Offline Reinforcement Learning	walker2d medium-replay	Normalized Score94.1	61
Offline Reinforcement Learning	walker2d medium	Normalized Score88.2	61
Offline Reinforcement Learning	hopper medium-replay	Normalized Score100.4	55
Offline Reinforcement Learning	halfcheetah medium-replay	Normalized Score54.1	54
Offline Reinforcement Learning	halfcheetah medium	Normalized Score59	53
Offline Reinforcement Learning	D4RL Adroit pen (human)	Normalized Return83.9	53
Offline Reinforcement Learning	D4RL Adroit pen (cloned)	Normalized Return66.5	53
Offline Reinforcement Learning	antmaze medium-play	Score84.8	44
Offline Reinforcement Learning	Walker2d medium-expert	Normalized Score116.6	42

Showing 10 of 39 rows

Other info

Code

Follow for update

@wizwand_team Discord