Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Distributional Reinforcement Learning via the Cram\'er Distance

About

This paper explores the application of the Soft Actor-Critic (SAC) algorithm within a Distributional Reinforcement Learning setting and introduces an implementation of such algorithm named Cram\'er-based Distributional Soft Actor-Critic (C-DSAC). The novel approach employs distributional reinforcement learning to represent state-action values, and minimizes the squared Cram\'er distance for learning the distribution. Empirical results across various robotic benchmarks indicate that our algorithm surpasses the performance of baseline SAC and contemporary distributional methods, with the performance advantage becoming increasingly pronounced in high-complexity environments. To explain the efficiency of the new approach, we conduct an analysis showing that its superior performance is partly due to \textit{confidence-driven} Q-value updates: High-variance target distributions (low confidence in target) lead to more conservative model updates, thereby attenuating the impact of overestimated values. This work deepens the understanding of distributional reinforcement learning, offering insights into the algorithmic mechanisms governing convergence and value estimation.

Vanya Aziz, Ivo Nowak, E.M.T Hendrix• 2026

Related benchmarks

TaskDatasetResultRank
Reinforcement LearningHopper v4
Average Return3.35e+3
26
Reinforcement LearningAnt v4
Average Return5.38e+3
18
Reinforcement LearningWalker2d v4
Avg Return4.81e+3
18
Reinforcement LearningHalfCheetah v4
Max Return1.00e+4
10
Reinforcement LearningHumanoid v4
Reward5.72e+3
9
Showing 5 of 5 rows

Other info

Follow for update