Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

About

Offline goal-conditioned reinforcement learning (GCRL) is a practical reinforcement learning paradigm that aims to learn goal-conditioned policies from reward-free offline data. Despite recent advances in hierarchical architectures such as HIQL, long-horizon control in offline GCRL remains challenging due to the limited expressiveness of Gaussian policies and the inability of high-level policies to generate effective subgoals. To address these limitations, we propose the goal-conditioned mean flow policy, which introduces an average velocity field into hierarchical policy modeling for offline GCRL. Specifically, the mean flow policy captures complex target distributions for both high-level and low-level policies through a learned average velocity field, enabling efficient action generation via one-step sampling. Furthermore, considering the insufficiency of goal representation, we introduce a LeJEPA loss that repels goal representation embeddings during training, thereby encouraging more discriminative representations and improving generalization. Experimental results show that our method achieves strong performance across both state-based and pixel-based tasks in the OGBench benchmark.

Zhiqiang Dong, Teng Pang, Rongjian Xu, Guoqiang Wu• 2026

Related benchmarks

TaskDatasetResultRank
Navigationpointmaze medium navigate v0 (test)
Success Rate99.5
7
Navigationpointmaze giant-navigate v0 (test)
Success Rate74.4
7
Stitchingpointmaze medium stitch v0 (test)
Success Rate97
7
Stitchingpointmaze teleport-stitch v0 (test)
Success Rate0.504
7
Navigationpointmaze large navigate v0 (test)
Success Rate80.5
7
Navigationpointmaze teleport-navigate v0 (test)
Success Rate39.8
7
Navigationantmaze-large-navigate v0 (test)
Success Rate87.1
7
Navigationantmaze giant navigate v0 (test)
Success Rate64.7
7
Navigationantmaze-teleport-navigate v0 (test)
Success Rate48.5
7
Navigationhumanoidmaze-large-navigate v0 (test)
Success Rate28.8
7
Showing 10 of 17 rows

Other info

Follow for update