Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

D2 Actor Critic: Diffusion Actor Meets Distributional Critic

About

We introduce D2AC, a new model-free reinforcement learning (RL) algorithm designed to train expressive diffusion policies online effectively. At its core is a policy improvement objective that avoids the high variance of typical policy gradients and the complexity of backpropagation through time. This stable learning process is critically enabled by our second contribution: a robust distributional critic, which we design through a fusion of distributional RL and clipped double Q-learning. The resulting algorithm is highly effective, achieving state-of-the-art performance on a benchmark of eighteen hard RL tasks, including Humanoid, Dog, and Shadow Hand domains, spanning both dense-reward and goal-conditioned RL scenarios. Beyond standard benchmarks, we also evaluate a biologically motivated predator-prey task to examine the behavioral robustness and generalization capacity of our approach. Code: https://github.com/d2ac-actor-critic/d2ac-public

Lunjun Zhang, Shuo Han, Hanrui Lyu, Bradly C Stadie• 2025

Related benchmarks

TaskDatasetResultRank
Exploration Coverage AnalysisPredator-Prey Environment (Map Level 5)
Visit Coverage (τ ≥ 1)44.58
3
SurvivalPredator-prey environment Map Level 5 (train)
Survival Rate87.05
3
SurvivalPredator-prey environment Map Level 9 Zero-Shot
Survival Rate90.69
3
Showing 3 of 3 rows

Other info

Follow for update