Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Flow Actor-Critic for Offline Reinforcement Learning

About

The dataset distributions in offline reinforcement learning (RL) often exhibit complex and multi-modal distributions, necessitating expressive policies to capture such distributions beyond widely-used Gaussian policies. To handle such complex and multi-modal datasets, in this paper, we propose Flow Actor-Critic, a new actor-critic method for offline RL, based on recent flow policies. The proposed method not only uses the flow model for actor as in previous flow policies but also exploits the expressive flow model for conservative critic acquisition to prevent Q-value explosion in out-of-data regions. To this end, we propose a new form of critic regularizer based on the flow behavior proxy model obtained as a byproduct of flow-based actor design. Leveraging the flow model in this joint way, we achieve new state-of-the-art performance for test datasets of offline RL including the D4RL and recent OGBench benchmarks.

Jongseong Chae, Jongeui Park, Yongjae Shin, Gyeongmin Kim, Seungyul Han, Youngchul Sung• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL antmaze-umaze (diverse)
Normalized Score93.5
40
Offline Reinforcement LearningD4RL MuJoCo Hopper medium standard
Normalized Score91.9
36
Offline Reinforcement LearningD4RL Adroit pen (cloned)
Normalized Return103.2
32
Offline Reinforcement LearningD4RL Adroit pen (human)
Normalized Return73.9
32
Offline Reinforcement LearningD4RL antmaze-large (play)
Normalized Score90
26
Offline Reinforcement LearningD4RL antmaze-large (diverse)
Normalized Score88
26
Offline Reinforcement LearningD4RL antmaze-med (diverse)
Normalized Score85
26
Offline Reinforcement LearningMuJoCo hopper D4RL (medium-replay)
Normalized Return99.1
26
Offline Reinforcement LearningD4RL Adroit hammer-human
Normalized Score860
22
Offline Reinforcement LearningD4RL Adroit hammer-cloned
Normalized Score1.11e+3
22
Showing 10 of 26 rows

Other info

Follow for update